Archive for the ‘information’ Category

Money and CA Propositions

Monday, June 7th, 2010

Since tomorrow we’ll be having another one of those practice democracy drills here in California, I thought I’d put together a few bar charts.

There are five propositions on tomorrow’s ballot. In researching them, Lena came across the Cal-Access Campaign Finance Activity: Propositions & Ballot Measures.

Unfortunately, for each proposition, you have to click through each committee to get the details for the amount of money they’ve raised and spent. Here’s a run-down in visual form, the only data manipulation I did was round to the nearest dollar. Note: no committees formed to support or oppose Proposition 13.

Here’s how much money was raised, by proposition:

Money Raised

Just in case you didn’t get the full picture, here is the same data plotted on a common scale:

Money Raised (common scale)

And the same two plots for money spent1:

Money Spent

Money Spent (common scale)

It could just be my perception of things, but I get pretty suspicious when there’s a ton of money involved in politics, especially when it’s this lopsided.

The only thing I have to add is you should Vote “YES” on Prop 15, because Larry Lessig says so, and so do the Alameda County Greens!

Update #1: Let me write it out in text, so that the search engines have an easier time finding this. According to the official record from Cal-Access (Secretary of State), as of May 22nd, 2010, there were $54.4 million spent in support of various propositions, most notably $40.5 million on Prop 16, $8.9 million on Prop 17, and $4.6 million on Prop 14. Compare that with a “grand” total of less than $1.2 million spent to oppose them, with a trivial $78 thousand (!!) to oppose Prop 16′s $40.5 million deep pockets.

Update #2: The California Voter Foundation included more recent totals (they don’t seem to be that different), as well as a listing of the top 5 donors for each side of a proposition in their Online Voter Guide.

Also, here’s the python code used to generate these plots (enable javascript to get syntax highlighting):

# Create contributions and expenditures bar charts of committees supporting and
# opposing various propositions on the California Ballot for June 8th, 2010
# created by Paul Ivanov (http://pirsquared.org)

# figure(0) - Contributions by Proposition (as subplots)
# figure(1) - Expenditures by Proposition (as subplots)
# figure(2) - Contributions on a common scale
# figure(3) - Expenditures on a common scale

import numpy as np
from matplotlib import pyplot as plt
import locale

# This part was done by hand by collecting data from CalAccess:
# http://cal-access.sos.ca.gov/Campaign/Measures/
prop = np.array([
     4650694.66, 4623830.07    # Yes on 14 Contributions, Expenditures
    , 216050, 52796.71         # No  on 14 Contributions, Expenditures
    , 118807.45, 264136.30     # Yes on 15 Contributions, Expenditures
    , 200750.01, 86822.79      # No  on 15 Contributions, Expenditures
    , 40706258.17, 40582036.58 # Yes on 16 Contributions, Expenditures
    , 83187.29,	78063.91       # No  on 16 Contributions, Expenditures
    , 10328675.12, 8932786.06  # Yes on 17 Contributions, Expenditures
    , 1229783.79, 965218.48    # No  on 17 Contributions, Expenditures
    ])
prop.shape = -1,2,2 

def currency(x, pos):
    """The two args are the value and tick position"""
    if x==0:
        return "$0"
    if x < 1e3:
        return '$%f' % (x)
    elif x< 1e6:
        return '$%1.0fK' % (x*1e-3)
    return '$%1.0fM' % (x*1e-6)

from matplotlib.ticker import FuncFormatter
formatter = FuncFormatter(currency)

yes,no = range(2)
c = [(1.,.5,0),'blue']  # color for yes/no stance
a = [.6,.5]             # alpha for yes/no stance
t = ['Yes','No ']       # text  for yes/no stance

raised,spent = range(2)
title = ["Raised for", "Spent on" ] # reuse code by injecting title specifics
field = ['Contributions', 'Expenditures']

footer ="""
Data from CalAccess: http://cal-access.sos.ca.gov/Campaign/Measures/
'Total %s 1/1/2010-05/22/2010' field extracted for every committee
and summed by position ('Support' or 'Oppose').  No committees formed to
support or oppose Proposition 13. cc-by Paul Ivanov (http://pirsquared.org).
""" # will inject field[col] in all plots

color = np.array((.9,.9,.34))*.9 # spine/ticklabel color
plt.rcParams['savefig.dpi'] = 100

def fixup_subplot(ax,color):
    """ Tufte-fy the axis labels - use different color than data"""
    spines = ax.spines.values()
    # liberate the data! hide right and top spines
    [s.set_visible(False) for s in spines[:2]]
    ax.yaxis.tick_left() # don't tick on the right

    # there's gotta be a better way to set all of these colors, but I don't
    # know that way, I only know the hard way
    [s.set_color(color) for s in spines]
    [s.set_color(color) for s in ax.yaxis.get_ticklines()]
    [s.set_visible(False) for s in ax.xaxis.get_ticklines()]
    [(s.set_color(color),s.set_size(8)) for s in ax.xaxis.get_ticklabels()]
    [(s.set_color(color),s.set_size(8)) for s in ax.yaxis.get_ticklabels()]
    ax.yaxis.grid(which='major',linestyle='-',color=color,alpha=.3)

# for subplot spacing, I fiddle around using the f.subplot_tool(), then get
# this dict by doing something like:
#    f = plt.gcf()
#    adjust_dict= f.subplotpars.__dict__.copy()
#    del(adjust_dict['validate'])
#    f.subplots_adjust(**adjust_dict)

adjust_dict = {'bottom': 0.12129189716889031, 'hspace': 0.646815834767644,
 'left': 0.13732508948909858, 'right': 0.92971038073543777,
 'top': 0.91082616179001742, 'wspace': 0.084337349397590383}

for col in [raised, spent]: #column to plot - money spent or money raised
    # subplots for each proposition (Fig 0 and Fig 1)
    f = plt.figure(col); f.clf(); f.dpi=100;
    for i in range(len(prop)):
        ax = plt.subplot(len(prop),1, i+1)
        ax.clear()
        p = i+14    #prop number
        for stance in [yes,no]:
            plt.bar(stance, prop[i,stance,col], color=c[stance], linewidth=0,
                    align='center', width=.1, alpha=a[stance])
            lbl = locale.currency(round(prop[i,stance,col]), symbol=True, grouping=True)
            lbl = lbl[:-3] # drop the cents, since we've rounded
            ax.text(stance, prop[i,stance,col], lbl , ha='center', size=8)

        ax.set_xlim(-.3,1.3)
        ax.xaxis.set_ticks([0,1])
        ax.xaxis.set_ticklabels(["Yes on %d"%p, "No on %d"%p])

        # put a big (but faded) "Proposition X" in the center of this subplot
        common=dict(alpha=.1, color='k', ha='center', va='center', transform = ax.transAxes)
        ax.text(0.5, .9,"Proposition", size=8, weight=600, **common)
        ax.text(0.5, .50,"%d"%p, size=50, weight=300, **common)

        ax.yaxis.set_major_formatter(formatter) # plugin our currency labeler
        ax.yaxis.get_major_locator()._nbins=5 # put fewer tickmarks/labels

        fixup_subplot(ax,color)

    adjust_dict.update(left=0.13732508948909858,right=0.92971038073543777)
    f.subplots_adjust( **adjust_dict)

    # Figure title, subtitle
    extra_args = dict(family='serif', ha='center', va='top', transform=f.transFigure)
    f.text(.5,.99,"Money %s CA Propositions"%title[col], size=12, **extra_args)
    f.text(.5,.96,"June 8th, 2010 Primary", size=9, **extra_args)

    #footer
    extra_args.update(va='bottom', size=6,ma='left')
    f.text(.5,0.0,footer%field[col], **extra_args)

    f.set_figheight(6.); f.set_figwidth(3.6); f.canvas.draw()
    f.savefig('CA-Props-June8th2009-%s-Subplots.png'%field[col])

    # all props on one figure (Fig 2 and Fig 3)
    f = plt.figure(col+2); f.clf()
    adjust_dict.update(left= 0.06,right=.96)
    f.subplots_adjust( **adjust_dict)
    f.set_figheight(6.)
    f.set_figwidth(7.6)

    extra_args = dict(family='serif', ha='center', va='top', transform=f.transFigure)
    f.text(.5,.99,"Money %s CA Propositions"%title[col], size=12, **extra_args)
    f.text(.5,.96,"June 8th, 2010 Primary", size=9, **extra_args)

    extra_args.update(ha='left', va='bottom', size=6,ma='left')
    f.text(adjust_dict['left'],0.0,footer%field[col], **extra_args)

    ax = plt.subplot(111)
    for stance in [yes,no]:
        abscissa=np.arange(0+stance*.30,4,1)
        lbl = locale.currency(round(prop[:,stance,col].sum()),True,True)
        lbl = lbl[:-3] # drop the cents, since we've rounded
        lbl = t[stance]+" Total"+ lbl.rjust(12)
        plt.bar(abscissa,prop[:,stance,col], width=.1, color=c[stance],
                alpha=a[stance],align='center',linewidth=0, label=lbl)
        for i in range(len(prop)):
            lbl = locale.currency(round(prop[i,stance,col]), symbol=True, grouping=True)
            lbl = lbl[:-3] # drop the cents, since we've rounded
            ax.text(abscissa[i], prop[i,stance,col], lbl , ha='center',
                    size=8,rotation=00)

    ax.set_xlim(xmin=-.3)
    ax.xaxis.set_ticks(np.arange(.15,4,1))
    ax.xaxis.set_ticklabels(["Proposition %d"%(i+14) for i in range(4)])
    fixup_subplot(ax,color)

    # plt.legend(prop=dict(family='monospace',size=9)) # this makes legend tied
    # to the subplot, tie it to the figure, instead
    handles, labels = ax.get_legend_handles_labels()
    l = plt.figlegend(handles, labels,loc='lower right',prop=dict(family='monospace',size=9))
    l.get_frame().set_visible(False)
    ax.yaxis.set_major_formatter(formatter) # plugin our currency labeler
    f.canvas.draw()
    f.savefig('CA-Props-June8th2009-%s.png'%field[col])

plt.show()
  1. I don’t fully understand what these numbers mean, as some groups’ “Total Expenditures” exceed their “Total Contributions” and still had positive “Ending Cash”

Immigration in the US, contextualized (with pictures)

Saturday, May 29th, 2010

So I probably don’t need to tell you this since you already know, but

Arizona sucks!

It turns out that even documented immigrants agree, and I have the graphs to prove it!

You see, it all started when I took a great Visualization course this past term which was taught by Maneesh Agrawala. Maneesh gave enough structure for the assignments, but also left some aspect of each open ended. For example, our first assignment had a fixed dataset which everyone had to make a static visualization of, but the means by which we did that was entirely up to us. A lot of people used Excel (in graduate level CS class? gross!), some people wrote little programs (I wrote mine in python using matplotlib and numpy, and did some cool stuff that I will have to post about another time and contribute back to matplotlib), there was even a poor sap who did it all in Photoshop, as I recall, but anything was fair game. Turns out we could even just draw or make something by hand and turn it in!

The second assignment, the source of my graphs which quantitatively demonstrate the suckiness of Arizona, required us to use interactive visualization software to iteratively develop a visualization by first asking a question, then making a visualization to address this question, and going back several times refine the question and make successive visualizations.

On thing to keep in mind is that, overall, naturalized citizens are both an exclusive and a discerning lot. In most cases, you have to be a permanent resident (have a Green card) for 5 years before you can apply. And there are quotas for how many people can get a Green card every year, so there are lots of hoops to jump through. Given the
amount of effort involved, wouldn’t it be nice to look at a breakdown of naturalized citizens by state? Because that would give us an idea about which states immigrants percieve as, for lack of a better word, “awesome”, or if you’re
not so generous, “least sucky”. I bet you’ll feel that this second description is more appropriate once you take a look at the data, but keep my “least sucky” premise in mind as you read my original write-up which focused on a different angle (but from which we can still draw some reasonable conclusions). I’ll return to make a few more comments about the title of this post after the copy-pasted portion.

here’s my original write-up:

begin cut —>

There are three kinds of lies: lies, damned lies, and statistics.

As an immigrant, I’ve always had the subjective feeling that about half of the people I’m acquainted with are either themselves immigrants, or the children of immigrants. The US prides itself in being a melting pot, a country built by immigrants, so I wanted to dive into the data that would help me understand just how large of a role immigration plays in terms of the entire country. The question I started with, for the purpose of this assignment is this:

What’s the relationship between naturalizations and births in the US?

But what I really wanted was to find out was what kind of question do I need to ask to get the answer that would be consistent with my world view. :)

To do this, I started with the DHS 2008 Yearbook of Immigration Statistics, which was linked from the class website.

The file I started with was natzsuptable1d.xls, which required cleanup before I could read it into Tableau. Turns out that even though “importing” to tableau format is supposed to speed things up, it seems very fragile and would regularly fail when I tried converting type to Number (there were some non-numeric codes, like ‘D’ for ‘Data withheld to limit disclosure). **NOT** importing to Tableua’s desired format also had the added benefit of allowing me to change the .xls files externally, and having all the adjustments made in Tableau, without having to re-import the data source.

Frustratingly, the last column and last row kept not getting loaded in Tableau! I also ran into an issue which I think had to do with the ‘Unknown’ country of origin and ‘Unknown’ state of naturalization which made the totals funky. It took a while to figure out, but there was a problem with Korea, because there was a superscript 1 by it, indicating that data from North and South Korea were combined.

I was trying to use the freshest data possible, so I used the CDC’s National Vital Statistics System report titled Births: Preliminary Data for 2007. I just had to copy paste the desired data, and massage it to fit the proper order columns in the excel table I already had handy. I put zeros for U.S. Armed Services Posts and similar territories which is probably not accurate, but this data was not available in the reports that I found. Interesting factoid: according to NVSS (CDC), in 2007 there were more people born in NYC than the rest of the state combined. (about 129K vs 126.5K). The only caveat with this data is that it contains only 98.7% of the data. The states with some missing portion of their data tabulated are Michigan (at 80.2% completeness), Georgia (86.4%), Louisiana (91.4%), Texas (99.4%), Alaska (99.7%), Nevada (99.7%), Delaware (99.9%). Thus, state-level analysis for MI, GA, and LA may be distorted.

The data I had from DHS is for Fiscal Year 2008, which, as it turns out, goes from October 1st, 2007 – Sept 30th, 2008. Thus, no matter which combination of NVSS and DHS datasets I used, there would necessarily be a mismatch in the date range covered by each, so I settled with describing my visualization as “using the latest available data”, noting the actual dates for each dataset in the captions. Also, the NVSS report contained a graph of births over time, which fluctuates very modestly from year-to-year, thus the visualization would not change qualitatively if I had 2008 birth data on hand.

I was having a really hard time trying to get a look at the data I wanted to see in one sheet, and ended up trying to make a dashboard that combined several sheets. I couldn’t figure out a good way to link the different states across datasets. I struggled for quite a while to pull out the data that I wanted to look at, and ended up having to copy past everything from DHS and NVSS (transposed) onto a new sheet in Gnumeric.

Here’s the result:

Initial visualization

So, in all of the US, about 1 in 5 new american citizens is an immigrant, or for every four births, we have one naturalization. That was kind of unsatisfying. I’ve lived in California the entire time I’ve been in the US, and I feel that at least California is more diverse than that. There’s all those states in the middle of the country that few people from the rest of the world would want to immigrate to, yet the people living in them are still having babies, throwing off the numbers which would otherwise support my subjective world view…

So I decided to look at the breakdown by state.

Broken down by state, what’s the relationship between naturalizations and births in the US?

my second iteration

I added the reference lines so that you could both read off the approximate total easier, and be able to do proportion calculations visually, instead of mentally. This started looking promising, as I’ve only lived in California, and it looks like it’s got quite a lot of immigrants as a portion of total new citizens.

It was still kind of hard to see the totals, so I decide to create my very first calculated field – which would had the very simple formula [Births in 2007]+[Total Naturalized]. Using this new field, I could now make a map, to see the growth broken down geographically. This was just a way of reaffirming my earlier bias against the middle states having babies without attracting a sufficient number of immigrants to conform to my world view.

gratuitous map (was too easy to do using the software)

In the breakdown by state bar graph, it was also difficult to visually compare the total births by state, because they all started at a different place, depending on the number of naturalizations for that state. So I decided to split the single bar and make small multiples for each state.

back to something more interpretable

It’s interesting that the contribution of naturalizations slightly changes the ordering of the growth of states. For example, Florida has fewer births than New York, yet it’s total growth is larger, because it naturalized 30,000 more people than New York. With this small multiples arrangement, it was now possible to do positional comparisons across categories, not just between naturalizations and totals. Turns out that more people get naturalized in California than are born in the entire state of New York. And since New York has the third highest number of births annually, more people got naturalized in California than are born in any state other than CA and TX.

This was too large of a graph, and the story I’m interested in is really the ratio between the birth and naturalizations (the closer to 1:1, the better), so I made another calculated field, which is exactly such a ratio, multiplied by a factor of a thousand, so I could give it a sensible description (Naturalizations per 1000 births). This refines my question

For every 1000 people born in the US, how many many immigrants become naturalized?

I then ordered on these ratios, and decided to filter the top states. Guam would have made the cut, but it is not a state, and (though I didn’t mention it earlier) it’s NVSS birth data was only 77% complete, so I excluded it. Fifteen is a nice odd number, but it actually marked a nice transition, as after Texas, everything else is less than 200 naturalizations per 1,000 births.

The small multiples bar graphs still looked too busy, and there was redundancy in the data, which didn’t tell a succinct story. So I switched to just look at the ratios alone. This revealed, that, indeed, the fact that I’ve been living in California makes my perspective quite unique, as it is one of three states, along with Florida and New Jersey, to have an outstandingly large number of naturalizations compared to births. It is so high, indeed, that it puts the naturalization per births rate in these three states at more than twice the national average!

Looking at ratio alone tells us about the diversity in each states growth, but carries more meaning in the context of total growth . Thus, added the combined totals (naturalizations and births) as a size variable, for context. The alternating bands to both make it easier to read off the rows, and to aid the comparison of sizes by framing every data point in a common reference window. It obviates that California is the state with 864,261 new citizens because fills the frame completely.

Final question: What are the Top 15 “Melting Pot” States?

almost done, would be nice to include context from the visualization I started with

Ordering the data in this way also shed light on the small but still very diverse states that would not have otherwise made the cut (and did not pop out in any manner on my previous bar graphs). Rhode Island and Hawaii got it going on, in terms of attracting immigrants.

Certainly the fact that I’m an immigrant myself also greatly influences whom I associate with, further skewing my world view towards a 1:1 ratio, but I’m actually quite impressed with just how close to that ratio is in California – 1:1.9. Of course, the data I’ve analyzed does not include the American-born 1st generation of children, nor does it take into account the number of immigrants living in the US that do not have citizenship. All of these factors would surely push the ratio even closer toward 1:1.

I decided to combine the US total growth information, since it’s gives further perspective on the entire data set, such as the fact that California accounts for about 16% of total US growth. It also sheds light on how the US average was calculated. A new “twice the nat’l avg” line makes explicit the three most diverse outlier states mentioned before. I also changed the colors to match the convention used in the bar charts made earlier. The US combined total line semantically links the data plotted with the national growth bar chart – i.e. the green dots are formed by the sum of born and naturalized citizens.

What are the Top 15 "Melting Pot" States?

<—- end of cut

Ok, so, to be honest, it turns out that I wrote a large chunk of this post (Arizona suckage included) before I actually looked back at my visualizations, only going off my memory that it wasn’t in the top 10. So Arizona is just below the national average in this “Melting Pot” ratio (a measure I made up, the number of naturalization per 1000 births). Since it is #12, some might say, “Paul, Arizona’s on your top 15 list”, to which I’ll reply: “So’s Texas.”

I guess I just wanted to share these purdy graphs I made a few months back, and it seemed like there was a somewhat topical angle on them a few weeks back, when I remembered that I hadn’t posted them on here yet. Anyway, I’d love to hear back your thoughts.

Publisher’s Block

Saturday, December 26th, 2009

One of the reasons I find it so difficult to get more than a couple of entries in per year, is that I know they aren’t going anywhere after I post them. They’re sticking around for a while, and if they’re full of trivial crap then that doesn’t reflect very well on me. Posting about trivial stuff was ok when I was still trying to establish a sense of identity. These days, when I write something public, say on a mailinglist, I agonize over every detail because I know that this digital breadcrumb with my name attached will be around forever. So I keep raising the stakes to myself, neurotically checking over every possible extra whitespace in a patch I send in, sinking hours into something that should have taken 15 minutes.

I’m finally getting to the point where I realize it’s a problem that, for example, even when I’m texting someone, I try to get all of the spelling and punctuation correct.

It’s slowing me down.

I’ve had a lot of half-written blog posts that, after stepping away from them for a short while just don’t seem significant enough. I try to only publish pieces that either I think about for a while, or that I’m not hearing/reading others write about. But I’m always mindful about adding noise. The way I see it, when it became super easy for anyone to publish online, a lot of content flooded in that I simply don’t care for. Same idea with web 2.0 – because of Ruby on Rails, Django, and other web frameworks, writing a fancy (but useless) website became super easy – and now we’re oversaturated with them1. So there’s this internal tension: I think there’s too much crap-content out there but at the same time my internal filter keeps me from publishing anything. I rarely express my thoughts about what I find important in writing anymore. Others don’t seem to make such a big deal about self-filtering, and are much more prolific writers/bloggers/coders, etc.

So here’s a new acronym-sized motto to help correct this behavior, which is starting to get sprinkled in comments in the software I’m writing for my research: LTS. Life’s too short.

LTS

LTS has been showing up in my code coments

I use it as a reminder of what in the past was one of my frequently used maxims: most things in life are pass or fail. This doesn’t mean that it’s ok to do a half-assed job on everything, but given that there’s a limited amount of time, I should focus my efforts only on that which is truly important. Typos in a text message or extra trailing whitespace do not qualify as such.

I wasn’t always this careful about what I publish. I’ve had some form of internet presence (as embarrassing as it may seem now) since I was in middle school. It started in one of those geocities neighborhoods, I don’t even remember any details right now, probably because my brothers helped me to set it up. I didn’t use my real name until I started a poetry website freshman year in high school.

I used my full name, because I wanted to express my thoughts and have them be connect back to my persona, not a pseudonym that I might grow tired of. I was quite explicit about this at the time. And I didn’t filter myself, I just counted a total of 20 poems on there which were written in the course of a year. None of them really make me cringe, and some I’m still quite proud of.

I had nothing to gain by hiding behind an alias. I think that attaching my real name somehow made my thoughts sincere. I started blogging socially my senior year in high school (livejournal), and looking back on the first entry there, I was just trying to capture day-to-day events and thoughts. Vim, THE editor, is mentioned five times in the first two entries :) . But there are some very candid and thoughtful remarks in there, too.

It’s kind funny to have your more than 10 year old website cited in a Yahoo! Answer to the question: “What is the best way to live life to the fullest?”.

Basement Cited

My first website recently cited in Yahoo! Answers

I mean, it is yahoo answers, we’re really scraping the bottom of the barrel when it comes to content2 , but it’s still cool. Yeah, ok, so it’s doubly embarrassing because the citation is just for the lyrics to “The Sunscreen Song”. I’m ok with that.

And I’m very grateful for my many friends and colleagues who, by their example, continue to give me the courage to release my thoughts and code out in the open. Thank you.

As I was putting my finishing touches on this post, I found a recent entry on Scot Hacker’s blog titled “(I Don’t Care About) Facebook and Privacy” that covers similar ground: “For me, it’s simple: If what you have to say shouldn’t be said to the whole world, then don’t say it online.” I agree, and it’s a more sensible standard than my “everything you say will forever be connected to you, so don’t screw it up!” But just to be clear, this should only apply to things you intend to write up and release: I absolutely oppose Eric Schmidt’s dismissal of privacy. Eric says, “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” Due to its construction, it bears striking similarity to Scot’s quote above with which I mostly agree. But to me, Eric’s statement is a 1984-sized world apart.

Anyway, hopefully I’ve adequately explained my “publisher’s block”, and there are many related topics left to explore, but this is where I’ll have to end this post for now. LTS.

  1. Though this problem will probably sort itself out with time. I didn’t intent to write about this now, so I’ll just keep that remark without developing it further
  2. in fact, Elaine absolutely refuses to read anything on that site anymore, despite the fact that frequently, her google search string is verbatim the same as the question which comes up as one of the top results

thoughts about the sea of information

Tuesday, July 31st, 2007

Everything is MiscellaneousI just finished reading1 David Weinberger’s Everything is Miscellaneous and I find it to be a pretty engaging description of how the state of knowledge evolved with time, and now it has given me a chance to write down some thoughts.

The basic gist of the book is that knowledge is no longer tied to the physical (e.g. books), which used to limit how one went about organizing and finding it (e.g. Dewey decimal system). Now we can attach as much metadata as our hearts desire, which technology helps us sift through to help us find what we want. Instead of each book having a particular place, as in a warehouse, or a relative position (alphabetical within a subject), an individual leaf of information lives on a multitude of trees simultaneously, and the trees themselves are dynamically created and rearranged for each user on the fly.

The first few chapters focused on how knowledge has been historically organized over the centuries. I did skim through a few of the middle chapters, it seemed to be pretty straightforward commentary on the digital lives most of us now lead – user created content, social tags and lists, auto-recommendation, etc. Some over-simplified, in that sometimes unavoidable awkwardness that comes out of describing something neat and complex yet obvious to those leading digital lives. It was refreshing to read about the downsides of scientific publications like Nature and Science (e.g. good science isn’t enough2 to publish because of how few articles get in, the research has to be “sexy”) and how the new comer PLoS One aims to correct these shortcomings. Because this was just the topic that was discussed at the Neuroscience retreat last year (in a lecture about the then-upcoming PLoS One), scientists care about this stuff and it comes back every so often.

Although I never considered it myself, I totally got it when Danae started her Master of Library Science. I would argue that more than anything else, what we’re producing most of in the world today is information. Perhaps capture and disseminate is a more appropriate description. Information, by itself, is agnostic to how it gets used (or abused). But the Cliff Stoll-ian side of me says that we should be weary of the exponentially growing amount of information, and not just for the obvious Big Brother / privacy reasons (e.g. “Plate reader draws objections of ACLU“).

The non-obvious threat of information is that we’re drowning in it (my claim). Here I’m glad Weinberger mentions Cass Sunstein’s book Republic.com3, the basic thesis of which4 is that with more and more information out there, we can all end up listening, watching, and reading only that which reinforces our world view – drowning out everything else without even having to plug up our ears and going “LALALALALA”, but by finding podcasts, channels, and blogs where others are doing the “LALALALALA” for us.

Touched by His Noodly AppendageIn many ways, this leads to huge portions of the population nonsensically parroting something like “Evolution is just a theory” to one another. Scientific theories both explain observed phenomena (why living organisms share so much of their DNA) and make predictions about future observations (my niece’s hair color based on that of her parents, or maybe one you don’t hear about so often: regular use of antibacterial soap might be a bad idea, placing evolutionary pressure on the bacteria to evolve immunity to the soap). Moreover simpler or more elegant, straightforward theories are preferred (aka Occam’s Razor). Which is why Intelligent Design is on par with Flying Spaghetti Monsterism, not science. But this has been better described in other places and elsewhere (suggestions welcome). The point is that I’m worried that there’s no way anyone get through to the people that end up isolating themselves in their own feedback loops. I worry that not enough people engage enough to think on their own. Technology can’t fix this problem. No amount of metadata will ever be enough5.

In this entry, I’ve linked to Wikipedia a few times, and while I agree it should not be regularly used for primary research, I also welcome the explicit uncertainty inherent in a publicly editable wiki, as it reflects the tentative nature of information, and I think we should be somewhat skeptical about a great deal. I have also been recommended, though I have not yet read Manuel Castells’ The Internet Galaxy, though perhaps it is more topical for a future post I’ve been brewing for a while. Has anyone read it? …Anyway, this is my first pass at processing this stuff, hope it’s not too scatterbrained6.

  1. In three evening sittings at Moe’s Books
  2. some might even argue “isn’t required”
  3. Republic.com starts with a succinct vignette: “the daily me
  4. on my quick skimming at the UCD bookstore this past Picnic Day.
  5. a point I think the book misses
  6. Cory Doctrow does a better job reviewing the book.