This Facebook Data Set Will Blow Your Mind

by David Evans on February 9, 2010   in Research

Yesterday I mentioned How to split up the US, which it turns out is just a taste of the analysis that researchers will be doing on the massive Facebook dataset made up of 210 million profiles set to be unleashed tomorrow but Pete Warden. That’s right, yours and mine, with our names intact (and some personally identifiable information removed.)

If what people call Web 2.0 was all about creating new technologies that made it easy for everyday people to publish their thoughts, social connections and activities, then the next stage of innovation online may be services like recommendations, self and group awareness, and other features made possible by software developers building on top of the huge mass of data that Web 2.0 made public.

I have been asking the online dating industry to do exactly this for at least five years. It makes me sad to think about all of those lonely hearts who will never find their match because their data is being squandered by companies more interested in profit than actually helping singles discover each other. Just think of all that juicy data hidden behind closed doors.
I bet some Netflix teams could improve online dating efficiency by 10 percent pretty easily. Wait that’s IntroAnalytics.

If dating profiles were open-source, we would have lots of matching and discovery services companies built on top of a giant dating dataset. Imagine if the most effective/efficient companies are the one’s that would win. Not the ones with the biggest marketing budget. I know, crazy talk!
As for niche sites, sure, they are hot right now, but that’s not going to last forever. Make your money while you can because its only going to get harder to win paying customers, especially with changes coming to the e-commerce landscape in coming months.
Look what Pete says in How to harvest Facebook profiles from emails without logging in:

Recently I was surprised to discover that you don’t need to be signed in to an account to search by email addresses and match them to profiles. To my mind this is a nasty hole both because it gives companies legal cover to resell the linked data, and in practice makes it tough for Facebook to crack down on firms siphoning off user data.

He describes exactly what you have to do to get those profiles. Wow. I wonder how many people have exposed this flaw?

Read Readwriteweb’s The Man Who Looked Into Facebook’s Soul for the full details of tomorrows launch. I hope the dating industry will at least pay attention to what people are doing with the data. We’re going to see many assumptions crushed, assertions tested and learn a whole lot about ourselves and society in a very short time.

Related posts:

  1. Dating Sites Start Pulling Facebook Data Into Profiles
  2. Dating Profiles Will Get Their Data From Facebook
  3. FaceBook On My Mind
  4. Do Dating Sites Own Your Personal Data?
  5. The Rise Of An Underground Facebook Market

{ 2 comments… read them below or add one }

Fernando Ardenghi February 9, 2010 at 9:24 pm

“I bet some Netflix teams could improve online dating efficiency by 10 percent pretty easily”

improve online dating efficiency by 10 percent???

That is wasting precious time!

The Online Dating Industry does not need a 10% improvement. It does need “a 100 times better improvement”

If you check Match or any other site performing as a Powerful Searching Engine, you will see [on average] a person (mostly men) will strongly like 3 or 4 persons per 100 (one hundred) persons or 30 to 40 persons per 1,000 (one thousand) persons screened, then that person will send messages to them an only [on average] 10% will strongly like (mostly women) and reply to the person who initiated the contact.
Searching on one’s own is in the range [on average] of 3 or 4 persons who search and select to each other per 1,000 persons screened.

If you check PerfectMatch or any other site performing mostly as Matching based on Self-Reported Data / Bidirectional Recommendation Engine (personal preferences, likes and dislikes, ipsative personality tests: MBTI, DISC) you will see [on average] a person receives 3 or 4 persons as recommended for dating purposes per 1,000 (one thousand) persons screened in exactly the same range of searching on one’s own.

If you check eHarmony or any other site like Chemisty, Parship, Be2, Meetic, etc performing mostly as a Compatibility Matching Algorithm (those sites are mostly using different versions of the Big5 normative personality test as its core) you will see [on average] a person receives 3 or 4 persons as highly compatible for dating purposes per 1,000 (one thousand) persons screened in exactly the same range of searching on one’s own and mutual filtering methods.

If you carefully complete all that homework, You will re-discover what I had discovered some years ago, by 2003, “the online dating sound barrier” for Compatibility Matching Algorithms.

Breaking “the online dating sound barrier” is to achieve far better precision than searching on one’s own or mutual filtering.

Actual Online Dating sites are fully intoxicated with different versions of the FFI five factor inventory / Big5 or other proprietary models instead (like Chemistry or PerfectMatch), to measure personality traits, and all of those tests are more simplified versions than the 16PF5 normative personality test.

Breaking “the online dating sound barrier” is to achieve at least:
3 most compatible persons in a 100,000 persons database.
12 most compatible persons in a 1,000,000 persons database.
48 most compatible persons in a 10,000,000 persons database.

100 times better than Compatibility Matching Algorithms used by actual online dating sites!

The only way to achieve that is:
- using the 16PF5 normative personality test, available in different languages to assess personality of members, or a proprietary test with exactly the same traits of the 16PF5. The ensemble of the 16PF5 is: 10E16, big number as All World Population is nearly 6.7 * 10E9

(WorldWide, there are over 5,000 -five thousand- online dating sites, but no one is using the 16PF5)

- expressing compatibility with eight decimals, like The pattern 6.7.6.8.9.6.7.7.8.7.2.5.8.7.3.4 is 92.55033557% +/- 0.00000001% similar to the pattern 7.7.6.8.8.7.6.5.8.7.4.5.7.7.3.4
Using a quantized pattern comparison method (part of pattern recognition by cross-correlation) to calculate similarity between prospective mates.

That is the only way to revolutionize the Online Dating Industry.

All other proposals like VisualDNA, IntroAnalytics, Basisnote are ………….. NOISE

Regards,

Fernando Ardenghi.
Buenos Aires.
Argentina.
ardenghifer@gmail.com

Reply

David Evans February 9, 2010 at 10:23 pm

Let’s not get ahead of ourselves, 10% would be a nice start.

VisualDNA is a promising system. Remember, there is a profile builder and some other tools, it’s not only about matching. As for IntroAnalytics. You have no idea what they are capable of, and neither do they at this point, so I’d hold your tongue about judging them just yet.

Fernando, you should start a blog and post your findings there, then link to it. Really drill home the 3-per-100 matches, thats a *powerful* argument.

As for the reality of 12 per million, that’s *never* going to happen. There’s math, science, psychology and the fact that we are flawed beings to take into account, but your sentiment rings true. Gotta build some noise into your own parameters to make it more real-world.

Reply

Leave a Comment

Previous post:

Next post: