Sunday, June 3, 2012

Glue Conference 2012

Glue Conference 2012 took place at the Omni Interlocken Hotel Bloomfield, CO on May 23 and 24th. Gluecon is an information packed developer conference that focuses on cloud, mobile, APIs, big data, and most importantly, developers. Some of the topics included NoSQL, node.js, HTML5, backend-as-a-service, cloud management and security, cloud storage, Hadoop, DevOps, mobile app development, and cloud platforms.

I attended the conference with sponsorship (full ride) from FullContact.  These guys were unbelievably gracious and showed me a great time while I was out there.  I came in contact with them when Bart Lorang, CEO of FullContact contacted me over e-mail and wanted to setup a time to talk with him and his engineering team about a paper I had published at a KDD'11 workshop.  After meeting with the guys and talking shop, I found out that they are solving the same real world problems (at world scale) that I was working on in my graduate research (at individual scale).

Some of the more interesting presentations/demo's of the conference included:

FullContact's Dan Lynn gave a presentation on Storm
The title of the presentation is Storm - The Real-Time Layer Your Big Data Has Been Missing.  The problem with big data that is constantly changing is that your processing jobs are typically done in batch processing, and while this works and is usually perfectly acceptable, batch processing operates over a snapshot in time of your data.  If you want to get the most accurate, and most up-to-date picture of your data, real-time processing is what you want.  Storm is a new framework for real-time computation on big data that operates using new concepts of streams, spouts, tuples and bolts.

EmergentOne makes it ridiculously easy to launch an API. Generate a complete and customized REST API for an existing application in minutes using a GUI interface.  I saw a demo of this hooked up to a world country MySQL database.  Within minutes the guy had created an API that I could hit over the internet.
Tempo is purpose-built database used to store and analyze massive streams of time-series data.  Think the internet of things here where each thing is generating data where the most important attribute is the time-stamp.  From their site "TempoDB is the first purpose-built data layer that enables the scalable storage and instant analysis of your time-series streams, so that you can learn from the past, understand the present, and predict the future." used to guard your website against unauthorized web scraping, competitor data mining, and more, without impeding your end user. was the winner of the Demo Pod which contained 12 new startups that were competing against each other for this title.  (FullContact was the winner of the Demo Pod for GlueCon last year).  While will be welcomed by many a content generator over the internet, it flies in the face of the web-scrapers out there like myself and FullContact who harness the massive amounts of information on the internet in order to aggregate the data into a meaningful product.  I'm still skeptical that they could prevent the scraping used in ArchiveFacebook.
Shout out to Robbie Jack and Kyle for showing me a great time in Boulder, CO the Friday after the conference.  We had fun bar hopping and playing Werewolf at the TechStars Boulder HQ.  I'm definitely going to have to try and come back for next years GlueCon.

Benefficient 1.0

Benefficient for Android was launched in the Play Store (or whatever the hell they call it these days) today.  Benefficient was created because I have seven credit cards, each with their own set of rewards that I can never seem to keep straight.  Why do I have seven credit cards?  That's a good question, let me explain.  I make all my purchases with credit cards in order to earn money from the rewards they offer.  I get different rewards for each card and by using the right card for the right purchase I can maximize these rewards, in some cases getting 5% cash back per purchase!  I've been tracking these rewards over the last two and a half years and I've averaged about $1300 cash-in-pocket per year.  I realize this approach isn't for everyone because it requires the discipline to pay the balance each month (which would wipe out any and all rewards accrued), but if you have the financial discipline for this, you can make a lot of money.

So, why is it hard to remember the rewards on my cards?  Let me demonstrate.  My American Express Blue Card gets 5% cash back on gas, groceries and pharmacies and 1% cash back on everything else.  However, that is only after I have achieved the $6500 spent on the card for that year.  Before that limit is met, I get 1.5% cash back on those categories and .5% on everything else.  My Discover More Card gets 5% cash back on revolving categories that change every one to three months.  This card also has category limits in which it stops the reward after say $750 spent in one particular category.

It gets even more confusing when you introduce points/miles into the mix.  My Citi Hilton HHonors Visa card gets 6 HHonors points per dollar at Hilton Brands hotels, 3 HHonors points on groceries, pharmacies and gas, and 2 HHonors points for all other purchases.  However, HHonors points don't exactly match up to cash back percentages.  In order to compare point accruing credit cards to cards that accrue cash back, you need to assign a cash value to the points.  With this particular card, since you can only redeem the points by booking Hilton Brands hotel stays, I've assigned a lower value to the value of the points.  Specifically, each point is worth half of a penny.  In other words 6 HHonors points per dollar spent would equal 3% cash back.  This is my only card of the 7 where I can't redeem for cash, but I like the card because I only use Hilton Brand hotels (the Embassy Suites managers reception with free alcohol pays for itself!) and I know that I will redeem the points that I accrue.

From these scenarios you can begin to see that this is too much information for the average person to remember.  Some people are really good at remembering this type of stuff like the Points Guy who travels frequently and blogs about how he games the travel miles system, but this is his career (or at least his side job).  I however, don't cut coupons, or stand in long lines for Black Friday.  In fact, I actually don't even spend that much time paying attention to credit card rewards.  All I do is think for a second or two before I make a purchase about what card gets me the best reward.  Many times I am wrong.  Actually, I've found by using Benefficient in testing over the last month or so that I am wrong alot more than I thought!

According to my Mint account, I spent about $57k last year, which would make the $1,100 in rewards I redeemed in cash last year about 1.9%.  Considering that I get 2% cash back on all purchases made with my FIA Rewards Card from American Express, and I get 5% cash back on other categories, I should have accrued around 3% (or $1,710) cash back if I had properly optimized!

That's the beauty of Benefficient, you don't have to be like the Points Guy, or the coupon-cutter or the Black Friday shopper.  You can make purchases like normal and rely on Benefficient to do the heavy-lifting!  It's that simple.

Sunday, August 28, 2011

Android + LastPass is a Match Made in Heaven

Since the invention of the smartphone there's been one thing that has irked almost every user - password management.  Getting passwords in the browser was a challenge for a while, but that has pretty much been handled by browser based password managers.  Up until this point however, the user/password problem for apps has not been solved.  Every time I install Facebook, or or Zillow, I am presented with a username and password challenge.  If it was just as easy as remembering the password and then inputting it, it wouldn't be that bad.  The problem is, I can't ever remember my passwords to social media sites.  They are not important enough to me to remember.  Its completely bothersome to load up an app for the first time only to be presented with a blank password form.

Finally, there is a solution.  LastPass Plus gives the ability to input usernames and passwords into apps.  It does this by using a custom keyboard.  Is it a perfect solution?  No.  Does it work?  Yes.  The keyboard itself is not nearly as good as the stock Android keyboard so its annoying to use for other apps, but what I do is use it only for the time that I need it and then switch back to stock when I don't.  Once again its not perfect, but it works.  I don't see another way to do it with the present state of Android.  The only other way to do it would be with Intents, but I'm pretty sure Intents don't provide the security needed to be requesting and sending passwords back and forth.

Until Android gets serious about password management, LastPass has a working solution.  I am very pleased that this solution is available to me as it solves a nagging problem.

Trip Report: KDD 2011

The SIGKDD 2011 conference took place August 21 - 24 at the Hyatt Manchester in San Diego, CA.  Researchers from all over the world interested in knowledge discovery and data mining were in attendance.  This conference in particular has a heavy statistical analysis flavor and many presentations were math intensive.

I was invited to present my masters project research at the Mining Data Semantics (MDS2011) Workshop of KDD.  In this paper, we present an approach to find social media profiles of people from an organization.  This is possible due to the links created between members an organization. For instance, co-workers or students will likely friend each other creating hyperlinks between their respective accounts.  These links, if public, can be mined and used to disambiguate other profiles that may share the same names as those individuals we are searching for.  The following figure shows the amount of profiles found from the ODU Computer Science student body for each respective social media site and the links found between them.

This picture represents the actual students themselves and the links between them.  Black nodes are undergrads, green nodes are grads, and red nodes are members of the WS-DL research group.

These are the slides:
An Unsupervised Approach to Discovering and Disambiguating Social Media Profiles

Here is the paper:
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating Social Media Profiles

View more documents from carlton.northern.
I've synopsized some of the interesting presentations from the conference:

Stephen Boyd - Stanford University "From Embedded Real-Time to Large-Scale Distributed".  Stephen Boyd's talk focused on his current research area of convex optimization.  He explained that convex optimization is a mathematical technique in which many complex problems of model fitting, resource allocation, engineering design, etc. can be transformed to a simple convex optimization problem to be solved and then transformed back into the original problem to get the solution.  He went on to explain how this can be implemented in real-time embedded systems sych as a hard disk drive head seek problem, to large distributed system such as California's power grid.

Amol Ghoting - IBM "NIMBLE: A Toolkit for the Implementation of Parallel Data Mining and Machine Learning Algorithms on MapReduce".  Use Hadoop to write a map function and a reduce function where you can map anything to a (key, value) pair.  The problem with Hadoop is that it has a two-stage data flow which can be cumbersome for programming.  Also, job scheduling and data mangement is handled by the user.  Lastly, code-reuse and portability is diminished.  This toolkit tries to make the key features of Hadoop available to developers but without a Hadoop specific implementation.  NIMBLE actually decouples algorithm computation from data management, parallel communications and control.  It does this through using a series of basic datasets and basic tasks that create a DAG.  Tasks can spawn other tasks.  With this structure in place, simultaneous data and tasks parallelism is achievable.

David Haussler – UC Santa Cruz “Cancer Genomics”.  DNA sequencing cost has reduced dramatically.  DNA sequencing was following Moore’s law but is now reducing cost 10 fold every two years.  Can now cheaply sequence entire genomes.  Created the Cancer Genome Atlas.  10,000 tumors will be sequenced in the next two years using this Atlas.  Cancer genome sequencing will soon be a standard clinical practice.  Because each persons DNA is different, and each tumor resulting from a persons DNA is different, a huge computational processing problem looms in the near distant future. 

Ahmed Metwally - Google. "Estimating the number of people behind an IP Address".  Most research assumes that there is 1 person using 1 IP address, but this is not the case.  IP's also change size of users, for instance, a hotel with a conference will have many more users possibly using the same IP address than usual.  So, how would one estimate the amount of these users in a non-intrusive way?  One method is to look at trusted cookie counts.  Another method is to look at diverse traffic.  Google caps traffic volume per IP to stop people from gaming the system using the same IP address.  Google knows how many users share an IP address because they are logged in with a username and password to Googles sites.  However, some of Googles traffic is from users that don't have a Google account.  This research is for those who want to filter users without asking them for any identification, thus preserving their privacy.  This method is currently being used at Google for determining click fraud.

D. Scully - Google "Detecting Adverserial Advertisements in the Wild".  An adversarial advertiser would be an advertiser that uses Google AdWords or AdSense to advertise misleading products like counterfeit goods or scams.  Most ads are good, only a small amount are bad.  Using in-house trained people to hand build rule based models.  Allowing these people to hand-build the rules gave a great incentive and improved morale rather than just having them do repetitive tasks over and over again.  Automated methods are being used as well, but this part of the presentation went right over my head.

Chunyu Luo - University of Tennessee "Enhanced Investment Decisions in P2P Lending: An Investor Composition Perspective".  In this paper, they are trying to decide which loans are worthwhile to invest, in other words, what makes a good loan?  Use a bipartite investment network with one side investors and the other investees and the edges between them loans.  Each loan can be considered a composition of many investors.  The idea is that by looking at the past performance of the other investors of a given loan, you can improve your prediction of the return rate for that loan.  Performed experiment from dataset of  The composition method far outperformed the average return of investment.

Susan Imberman - College of Staten Island "From Market Baskets to Mole Rats:  Using Data Mining Techniques to Analyze RFID Data Describing Laboratory Animal Behavior".  This paper presents the data mining techniques used in analyzing RFID data from a colony of Mole Rats.  Much like we use RFID in cars for tolls like EZ Pass, they are using RFID on Mole Rats and when they pass specific points of the colony (a series of pipes and rooms) they collect that sample.  They used k-means clustering which showed animal place preference.  Used an adjacency matrix to get an idea of which Mole Rats liked to be near one another.  This created 3 distinct sub graphs which corresponded well to the different colony structure of Mole Rats, queen workers, large workers and small workers.  Next they correlated common transactions made in the grocery store with items in a basket to repeat behavior of Mole Rats.

After the conference ended on Wednesday, Hurricane Irene was on track for a direct hit to Hampton Roads.  My flight was scheduled to arrive in Norfolk around 11pm which was cutting it very close to the storm hitting.  So I decided to extend the trip till Monday and ride out the storm here in sunny San Diego.  In total, I managed to miss a hurricane, a tornado, an earthquake, and a swamp fire.  I think I made a good decision...

Wednesday, March 23, 2011

How To: Permanent Root and Flash CyanogenMod on Newer MyTouch 4G's

So, I purchased my MyTouch 4G back in October of last year and had no problem rooting it using the methods that came out in November.  However, I just got another brand new MyTouch 4G through work that also needed to be rooted and I ran into big problems.  After about 6 hours of tinkering I finally found out that the new phone had a non-engineering HBoot.  This means that there isn't as much flexibility in what you can and can't do with it and you have a greater chance of bricking if something goes wrong.  So, how do you know which HBoot you have?  Simple, reboot the phone into the bootloader by holding the power and bottom volume button.  Then on the boot loader screen it will either say 0.86.000 or 0.85.2007.  If it is the .85 then you have the engineering HBoot and any permanent root method should work.  If it says .86 then you don't have an engineering HBoot and you need to use this method.  To be clear, I didn't create these methods, I am simply aggregating the three steps needed to perm root and load CyanogenMod. 
This tutorial assumes you are familiar with adb, if not look here.  If you are running a 64 bit version of Windows you may have problems loading the adb driver.  This thread should provide the solution.

Then this method will perm root the phone:

These instructions below were extracted from the link above for convenience.  I fitted them for .86 boot loader situation.

Download this file: New version of gfree with more options! See below.
md5sum: b73c56ca0e21664c5756d4ad295063c5

1. Now unzip the file into your SDK tools directory.

2. Plug your device into your computer.

3. Now open explorer and hold down shift at same time you right mouse click on your SDK tools directory (platform-tools if you have the R8 version of the SDK). Select open command window here. If you are in linux (ubuntu) right mouse click on your SDK tools folder, choose actions, and choose open command window(or whatever it's called). Otherwise, open a command prompt and cd your way to your SDK tools directory.

4. Type "adb push gfree /data/local" and hit enter.

Optionally, you could dl the file to your phone use androzip or something like it to unzip the file and then use root explorer to move the file named gfree to /data/local. Not the gfree.h file. All the other files are source code inlcuded for the gnu license. This would skip the first 4 steps. 

5. Now unplug your device from the computer.

6. Run visionary to gain temp root. (If you were already permarooted w/s=off ignore this step.)

7. Open terminal emulator on your device, type "su", and hit enter to gain root privileges.

8. Type "cd /data/local" and hit enter.

9. Now type "chmod 777 gfree" and hit enter to make the program executable.

10. Type "./gfree -f" and hit enter.

New features in gfree.
gfree usage:
gfree [-h|-?|--help] [-v|--version] [-s|--secu_flag on|off]
-h | -? | --help: display this message
-v | --version: display program version
-s | --secu_flag on|off: turn secu_flag on or off
-c | --cid : set the CID to the 8-char long CID
-S | --sim_unlock: remove the SIMLOCK

With the new features you can turn off one thing at a time. You can also turn security back on and set the CID back to stock if you wish. To turn simlock back on you still have to follow the revert procedures on this page as the information that is patched to turn the lock off is encrypted and we can't write back to it other than restoring the entire image.

So, if you wanted to leave simlock on but turn security off and set super CID the command would be "./gfree --secu_flag off --cid 11111111" + enter. The -f switch after ./gfree that is now in the above step (./gfree -f) just tells gfree to patch everything.

11. Wait for the program to finish and then reboot into HBoot to see if S=Off. Also, check your bootloader version. If it says s=off and has bootloader version 0.86.0000 it worked.

12. Run visionary again (temproot w/set system r/w after root checked and then attempt permroot) to make root privileges permanent and then reboot again. Now "su" should work properly for you.

Gfree writes a backup of the file that it patches named Part7backup-numbers.bin on your sdcard. I suggest putting this file in a safe place as it is the only way to revert if you need to.

Next, follow these steps to flash the engineering bootloader
1. Restart your phone and plug it back into your computer.

2. Download this file:
md5sum: df4fd77f44993eb05a4732210d2eddc6

3. Copy the file to your SDK tools directory.

4. Open a command prompt again and cd into your SDK tools directoty (platform-tools if your on the new R8 SDK).

5. Type "adb push hboot_dhd.nb0 /data/local" and hit enter.

6. Now open terminal on your device and type "su" and enter to gain root priviliges.

7. Type "cd /data/local" and enter.

8. Now type "dd if=hboot_dhd.nb0 of=/dev/block/mmcblk0p18" and hit enter.

9. You should see something like: 2048 bytes in 2048 bytes out 1048576 bytes copied blah blah blah.

10. Now restart the device into hboot and check if your bootloader version is 0.85.2007. That is what you want to see.

Congratulations, you now have a TRUE root and engineering bootloader on your shiny MT4G!!!

Once CyanogenMod is loaded, you will need to pull out and then replace the battery to get WiFi working.

Enjoy CyanogenMod!!

Friday, March 4, 2011

Total Cost of Ownership for the iPad 2

So, with the announcement of the new iPad 2 there has been a lot of talk of it's competition and how the iPad 2 stacks up.  What is incessantly compared are the tech specs, apps, and initial cost but what is almost never discussed is the total cost of ownership.  The technical specs are of course important, and on that front, I'd say the iPad 2 and the XOOM (the only real competition to the iPad 2 currently) are evenly staked.  The iPad 2 is lighter and thinner and has better battery life, but the XOOM has (or will have with a free hardware upgrade) 4G and a removable SD card (it also has a Barometer, but that has limited usefulness).

The iPad series of course is much better suited in the app space, but that's only because it has a year on Google's first Tablet approved OS, and at the rate the Android is giving away new XOOM's to developers, I' believe it's only a matter of time before there are as many Android Tablet apps as there are iPad.

The initial cost is a point of contention in most comparisons as well.  Depending on the bias of the author towards one platform or the other you get comparisons like these.  "The iPad 2 is cheaper than the XOOM with entry level costs of $499 compared to $799."  Or, "The XOOM at $799 is comparable in price to the high end iPad 2 at $829."  Just recently leaked however is new pricing from Sam's club for a WiFi only version of XOOM for $539 which is a cool $60 cheaper than the 32GB iPad 2.  So, I think we can safely agree that these two tablets initial costs are comparable.

The last comparison, and the one that I want to discuss more in depth is how much it costs to get data, and the limitations of data on these devices.  The first option for either is to just sign up for a data agreement.  The data prices are comparable, especially considering that you can get data for either of these devices from the same carrier, Verizon...  This option is for suckers though, why pay for data when you more than likely have data on your phone, and both Android and the iPhone support data tethering?

This, is where it gets a little tricky.  For the iPad to tether from your iPhone you need to purchase the tethering package plan which will add an additional $20 to your data plan.  The catch is that for those of you grandfathered-in, unlimited iPhone data plans, you have to switch to a limited data plan.  Further, the extra $20 dollars that you pay is only for 4GB of data.  You could go through 4GB of data in a week watching YouTube, so you may very well end up paying much more than $20.

The answer to all of these absurd charges/limitations for data on the iPad2 is to forgo the iOS family of products and go with Android.  I have a 4G Android phone on T-Mobile which I pay $25 for unlimited data.  I use the hotspot on my phone to share my 4G data connection to my XOOM for $0.  I guess I could do the same with an iPad2, but the point is you need an Android phone, not an iPhone, to do this.

Other useful information to take notice of is that I only pay $30 a month (with my 10% corporate discount) for my cellular service in which I get 750 minutes (for $10 more I could get 1500).  If I'm ever worried about going over minutes I can make calls for free using Google Voice over SIP from my Xoom or my phone).  I also don't pay anything for text messages because I use Google Voice which provides free text messaging.  All total I pay $55 a month with no contract.  Life without strings is great isn't it?

Friday, February 25, 2011

20 Things the Motorola XOOM Can Do that the iPad Can't

So once again my favorite tech sparring partner and I are at it.  The screenshot below speaks for itself.    

So, I put together this list (which came quite easy) of 20 things the XOOM can do that the iPad can't:
  1. Google Maps 5 with 3D Maps
  2. Widgets
  3. Live Wallpapers
  4. Cloud Syncing (apps, settings)
  5. Dual core processing
  6. More pixels, better DPI (nicer screen)
  7. 2 Cameras, front and back
  8. Dual Flash on back camera
  9. Video chat
  10. Removable SD Card and battery (update: apprently he battery isn't removable... shame on you Motorola) 
  11. HDMI
  12. Improved 1080p video playback
  13. Share Data Connections with Portable Wifi Hotspots (this by the way is how I am getting 4G data to my XOOM, from a hot spot on my MyTouch 4G on TMobile)
  14. Improved gaming with a Gyroscope 
  15. Predicts the weather with a Barometer
  16. 720p video capture with Movie Studio to directly edit video on the device
  17. Google Maps Navigation (The iPad Wifi version doesnt even have a GPS)
  18. True multitasking
  19. System Bar (in bottom right corner) for global status and notifications
  20. Application Bar (in top left corner) for contextual options, navigation, widgets, or other types of content