1. pheriday 2: termcasting overview

    pheriday 2: termcasting overview (2012-08-03) from Paul Ivanov on Vimeo.

    paul's habitual errant ramblings (on Fr)idays (2012-08-03)

    show notes:
    http://pirsquared.org/blog/2012/08/04/termcasting/
    gopher://sdf.org/1/users/ivanov/pheridays/2012-08-03 (yes, gopher!)

    1. try to not say "uuuuhhhhmmnn"

    BAM/PFA Summer Cinema on Center Street http://bampfa.berkeley.edu/filmseries/summercinema

    1. SciPy 2012 videos up, go check them out! (I have!) http://www.youtube.com/nextdayvideo (removed nextdayvideo internal box url, by request)

    Software Carpentry: Record and Playback post http://software-carpentry.org/2012/07/record-and-playback/

    1. termcasting: a review of what's out there.

    http://termcast.org and http://alt.org/nethack/ mostly nethack stuff, both just use telnet protocol, only live sessions (though there are "TV" scripts to re-run sets of ttyrec files).

    (playterm vs ascii.io vs shelr.tv)

    tldr: playterm.org supports ttyrec files, but has the most primitive player. Players on ascii.io and shelr.tv can both seek. shelr.tv can also speed up playback! Downside is both of those have their own recorder programs (though at least shelr leverages script or ttyrec)

    http://playterm.org/ - supports ttyrec files, outgoing links for author and related article, comments. - most primitive player (https://github.com/encryptio/jsttyplay/) - Pause only - only terminal sized of 80x24 or 120x35 - supports tags and comments - service only (code for playterm.org does not seem to be available, though jsttyplay is doing the hardest part of actual playback)

    http://ascii.io/ - supports non-standard terminal size - player can seek. - aesthetic thumbnail previews - login via github or twitter credentials (for uploads) - code for website available (ruby and javascript) https://github.com/sickill/ascii.io - code for recorder available (python) https://github.com/sickill/ascii.io-cli

    http://shelr.tv/ - supports non-standard terminal size - player can seek. - player playback speed can be increased (currently up to 10x of real time) - supports tags, comments and voting up/down on a video - shelr can playback from command line ("shelr play http://shelr.tv/records/4f8f30389660802671000012.json") - code for website available (ruby and javascript) [AGPLv3] https://github.com/shelr/shelr.tv - code for recorder available (ruby) [GPLv3] https://github.com/shelr/shelr

    1. my wanted list for termcasting
    2. should support ttyrec files (upload and download)
    3. live-streaming (like ustream - but for coding)
    4. termcast.org has ttrtail which does just this
    5. quick "encrypt" switch - to keep streaming, but start GPG encrypting the stream as it goes out - so you can still look at it later. This would make it easy to leave the streaming on all the time
    6. a .tty editor that's like a video editor cut out portions [i.e. dead time]

    This is a low-bandwidth way of capturing what I'm working on and thinking about. Now, I'm going to try to record everything I do! "ttyrec -e screen -x". I've only done it a couple of times so far while coding, but I find being able to go back and re-view (and review) what I worked on at the end of the day to be really helpful.

    I was inspired by Joey Hess' "git-annex coding in haskell" where he reviews and narrates some of the code he wrote, after he wrote it. http://joeyh.name/screencasts/git-annex_coding_in_haskell/

    P.S. It's Saturday now. I tried to save some local diskspace by running recordmydesktop using the --on-the-fly-encoding option, and that was a mistake. The audio and video were (un)hilariously desynchronized - the audio ran for 9:48, but the video wanted to be just 7:30. Audacity came to the rescue by allowing me to change the tempo to be 30% faster, which made the syncing better. And then I used avconv to stitch in the faster audio.

    tools used: Debian GNU/Linux sid, recordmydesktop, xmonad, fbpanel, screen, chromium, cheese, xcompmgr, audacity, avconv

    permalink
  2. pheriday 1: software carpentry, digital artifacts, visiting other OSes

    Here's pheriday 1, another edition of paul's habitual errant ramblings (on Fr)idays

    pheriday 1: software carpentry, digital artifacts, visiting other OSes (2012-07-27) from Paul Ivanov on Vimeo.

    2012-07-27.mp4 (28 MB) 2012-07-27.avi (71 MB) 2012-07-27.ogv (321 MB)

    show notes

    I had three topics I wanted to cover today, and ended up spending about an hour thinking about what I was going to say and which resources I was going to include. This was too long, and the end result was still very rambling, but I think I'll get better at this with more practice.

    SDF Public Access UNIX System http://sdf.org gopher://sdf.org/1

    1. try to not say "uuuuhhhhmmnn"
    2. be lazy (software carpentry)
    3. flipside of "publisher's block" (git-annex)
    4. visiting another country (windows 8 release preview)

    As usual, I didn't know what I was really trying to say in 0, and here's a really good overview of what I meant: Software Carpentry

    If you have 90 seconds, watch the pitch

    play it in 60 seconds, instead: mplayer -af scaletempo -speed 1.5 Software_Carpentry_in_90_Seconds-AHt3mgViyCs.flv

    1. publisher's block

    Jaron Lanier's You Are Not A Gadget What I mention in the video is not at all the main point of Lanier's book (which is quite good!), and in fact, his book is a critique of (over) digitization. Nevertheless, I'm only pointing out that there are redeemable aspects of an increasingly digital artifact producing life, such as preservation.

    David Weinberger’s Everything is Miscellaneous

    My review of David Weinberger’s Everything is Miscellaneous, where I go into more depth about "information overload".

    git-annex Excellent project. The technical details is that when you "annex" files, they are renamed to long hash of their contents (bit rot resistant!) and stored in a .git/annex/objects directory, whereas in place of where the file was, you get a symlink to the original file, which gets added to git. So git only keeps track of symlinks, and additionally has a git-annex branch that keeps track of all known annexes, so that you can copy, move, and drop files from the ones that are accessible. Very handy!

    Haiku OS

    tools used: Debian GNU/Linux sid, recordmydesktop, xmonad, fbpanel, screen, iceweasel, cheese, xcompmgr, youtube-dl, mplayer, screen

    gopher version of this post (proxy)

    permalink
  3. pheriday 0: scientist-hacker howto (video post)

    Hey everyone, here's pheriday 0, the first of paul's habitual errant ramblings (on Fr)idays

    pheriday 0: scientist-hacker howto (2012-07-20) from Paul Ivanov on Vimeo.

    Berkeley Kite Festival (510 Families)

    Merlin Mann's Most Days (specifically the travel day one on 2009-01-11)

    Sad that I missed SciPy Conference this year. One of the things I like doing at scipy is nerding it up with my friends, seeing each others workflows, showing off vim tricks, etc. This video was my attempt at scratching that itch, a little bit. As I mention in the video, this is take 2. Take 1 ended when I ran out disk space, but needless to say, it was more awesome than this. It seems I am cursed with losing first takes, see also a summary of last year's SciPy conference, where this exact same thing happened.

    NumFOCUS: NumPy Foundation for Open Code for Usable Science

    NumFOCUS Google Group see thread titled: "[Funding] Notes from Funding BOF at SciPy2012"

    TLDP: The Linux Documentation Project (page I was scrolling through)

    Transition to Gopher was rough this time, it was better during the first take.

    Lorance Stinson's w3m (better) gopher support Use this if, for example, going to w3m gopher://sdf.org you get errors like:

    [unsupported] '/1' doesn't exist! [unsupported] This resource cannot be located.
    

    It still took some tweaking, shoot me an email for details

    Robert Bigelow's About | Gopher & GopherSpace

    Here's the HTTP Proxied version of the above: Gopher proxy provided by Floodgap

    SDF Public Access UNIX System http://sdf.org gopher://sdf.org/1

    Eric S. Raymond's How To Become a Hacker Howto

    Fernando Perez' Py4Science Starter Kit

    Q: Why are you using "Chromium --incognito"? I have chronic tabitis, and this is one way of mitigating that problem. If the browser crashes or I shutdown my computer, I won't have those tabs around anymore.

    programs used: Debian GNU/Linux sid, recordmydesktop, xmonad, fbpanel, screen, chromium, cheese, xcompmgr, mutt, wyrd, tail, w3m

    gopher version of this post (proxy)

    permalink
  4. Ada Lovelace Day: remembering Shirley Theis and Evelyn Silvia

    In case you didn't know it - today is Ada Lovelace Day!

    Now, as any self-respecting Computer Science degree-wielding person should, I, too, think it's important to celebrate the day named after the world's very first programmer.

    For me, the first math teacher I remember making a big difference was Shirley Theis - who taught me Algebra in 8th grade at McKinley Middle School in Redwood City, CA. Mrs Theis, an energetic dynamo in her mid fifties, was a deeply motivated and caring teacher, who expected a lot out of her students, but never in a disciplinary manner. She was full of enthusiasm, which projected out and infected even the most timid or disaffected student: in her class, you couldn't be just a sack of potatoes planted in your seat.

    She often lead class in a nearly theatrical manner - pacing back and forth, egging students on by eagerly repeating their partial responses, getting exponentially more excited if the student was on the right track, barely containing herself from jumping up and down in anticipation of that lightbulb going off -- and yet just as quickly waning in her enthusiasm,becoming a personified caricature of hopelessness and despair to let you know the instant a response was starting to go astray.

    It may have been the only math class I've ever taken where there were group assignments - we would work with a partner or a few classmates in trying to figure out an assignment, first trying it solo, and then putting our heads together to figure out why our answers disagree and which is the right one. I believe it was Mrs. Theis who succinctly captured a value I hold in high regard: "it's not about how far you go - it's about how many people you bring with you."

    There was one other mathematics teacher I had in my life who clearly stands out: it was Professor Evelyn Silvia who had a comparable level of enthusiasm and energy, and from whom I had the pleasure of taking the first upper-division math course (Math 108 - Intro to Abstract Math) during my second quarter at UC Davis. Dr. Silvia was the real deal - she cared, gesticulated, encouraged us to question why something was true, and had an approach which demanded we each take ownership of our education. The book for the course, Introduction to Abstract Mathematics: A Working Excursion by D.O. Cutler and E.M. Silvia was a blue workbook - each of us had our own copy, and there were blanks left out for us to write our own answers to the exercises. The fact that the book had blanks for me to fill in was so inviting, there was a kind of "working mathematician" approach that came with it with that it made me really enjoy and look forward to working through the material. I still have mine.

    Dr. Silvia was incredibly sharp, not just intellectually but also interpersonally. Not only could she gauge when the class was lost, but she also had a knack for spotting if something was affecting you outside of class. She was really committed to helping you not just as a student, but as a person. I remember spending hours at Mishka's, or Cafe Roma, or the CoHo, reading and writing, wanting to do well and not let Silvia down, because she invested so much energy and placed a great deal of trust in us.

    So thank you both, Shirley Theis and Evelyn Silvia - you both encouraged me to grow a lot as a person, challenged my concept of what it means to be a student, and by your example provided a template of what it means to be an effective teacher, which I've imitated and embraced with pleasure in my own teaching.

    (tagged scipy to spread word of Ada Lovelace day to Planet SciPy)

    permalink
  5. vim-ipython two-way integration! (updated: 2011-08-02)

    I'm very pleased to share with you a demo the forthcoming vim-ipython integration which will work with IPython 0.11(trunk).

    You can either use the Flash player below, or download the OggVorbis file (14MB) update: vim-ipython 'shell' demo (9.6MB). The blog-free form of this post is here.

    If you like what you see and want to try it, you can get the details from the vim-ipython github page and it currently requires 4 line changes to IPython, which are currently in this pull request. (Fixed to work on IPython trunk with no changes).

    Big thanks to Min for walking me through the new IPython kernel manager during the SciPy2011 sprints.

    UPDATE: 2011-08-02

    vim-ipython ‘shell’ mode.

    Just in case, here are the same videos as above, but hosted on Youtube:

    If you're have any issues, try searching for your error on the vim-ipython github issues page, and if you don't find it, please file a new one, and I'll help you out there.

    permalink
  6. Money and CA Propositions

    Since tomorrow we'll be having another one of those practice democracy drills here in California, I thought I'd put together a few bar charts.

    There are five propositions on tomorrow's ballot. In researching them, Lena came across the Cal-Access Campaign Finance Activity: Propositions & Ballot Measures.

    Unfortunately, for each proposition, you have to click through each committee to get the details for the amount of money they've raised and spent. Here's a run-down in visual form, the only data manipulation I did was round to the nearest dollar. Note: no committees formed to support or oppose Proposition 13.

    Here's how much money was raised, by proposition:

    Money
Raised

    Just in case you didn't get the full picture, here is the same data plotted on a common scale:

    Money Raised (common
scale)

    And the same two plots for money spent ((I don't fully understand what these numbers mean, as some groups' "Total Expenditures" exceed their "Total Contributions" and still had positive "Ending Cash")):

    Money Spent

    Money Spent (common scale)

    It could just be my perception of things, but I get pretty suspicious when there's a ton of money involved in politics, especially when it's this lopsided.

    The only thing I have to add is you should Vote "YES" on Prop 15, because Larry Lessig says so, and so do the Alameda County Greens!

    Update #1: Let me write it out in text, so that the search engines have an easier time finding this. According to the official record from Cal-Access (Secretary of State), as of May 22nd, 2010, there were $54.4 million spent in support of various propositions, most notably $40.5 million on Prop 16, $8.9 million on Prop 17, and $4.6 million on Prop 14. Compare that with a "grand" total of less than $1.2 million spent to oppose them, with a trivial $78 thousand (!!) to oppose Prop 16's $40.5 million deep pockets.

    Update #2: The California Voter Foundation included more recent totals (they don't seem to be that different), as well as a listing of the top 5 donors for each side of a proposition in their Online Voter Guide.

    Also, here's the python code used to generate these plots (enable javascript to get syntax highlighting):

    # Create contributions and expenditures bar charts of committees supporting and
    # opposing various propositions on the California Ballot for June 8th, 2010
    # created by Paul Ivanov (http://pirsquared.org)
    
    # figure(0) - Contributions by Proposition (as subplots)
    # figure(1) - Expenditures by Proposition (as subplots)
    # figure(2) - Contributions on a common scale
    # figure(3) - Expenditures on a common scale
    
    import numpy as np
    from matplotlib import pyplot as plt
    import locale
    
    # This part was done by hand by collecting data from CalAccess:
    # http://cal-access.sos.ca.gov/Campaign/Measures/
    prop = np.array([
         4650694.66, 4623830.07    # Yes on 14 Contributions, Expenditures
        , 216050, 52796.71         # No  on 14 Contributions, Expenditures
        , 118807.45, 264136.30     # Yes on 15 Contributions, Expenditures
        , 200750.01, 86822.79      # No  on 15 Contributions, Expenditures
        , 40706258.17, 40582036.58 # Yes on 16 Contributions, Expenditures
        , 83187.29, 78063.91       # No  on 16 Contributions, Expenditures
        , 10328675.12, 8932786.06  # Yes on 17 Contributions, Expenditures
        , 1229783.79, 965218.48    # No  on 17 Contributions, Expenditures
        ])
    prop.shape = -1,2,2
    
    def currency(x, pos):
        """The two args are the value and tick position"""
        if x==0:
            return "$0"
        if x < 1e3:
            return '$%f' % (x)
        elif x< 1e6:
            return '$%1.0fK' % (x*1e-3)
        return '$%1.0fM' % (x*1e-6)
    
    from matplotlib.ticker import FuncFormatter
    formatter = FuncFormatter(currency)
    
    yes,no = range(2)
    c = [(1.,.5,0),'blue']  # color for yes/no stance
    a = [.6,.5]             # alpha for yes/no stance
    t = ['Yes','No ']       # text  for yes/no stance
    
    raised,spent = range(2)
    title = ["Raised for", "Spent on" ] # reuse code by injecting title specifics
    field = ['Contributions', 'Expenditures']
    
    footer ="""
    Data from CalAccess: http://cal-access.sos.ca.gov/Campaign/Measures/
    'Total %s 1/1/2010-05/22/2010' field extracted for every committee
    and summed by position ('Support' or 'Oppose').  No committees formed to
    support or oppose Proposition 13. cc-by Paul Ivanov (http://pirsquared.org).
    """ # will inject field[col] in all plots
    
    color = np.array((.9,.9,.34))*.9 # spine/ticklabel color
    plt.rcParams['savefig.dpi'] = 100
    
    def fixup_subplot(ax,color):
        """ Tufte-fy the axis labels - use different color than data"""
        spines = ax.spines.values()
        # liberate the data! hide right and top spines
        [s.set_visible(False) for s in spines[:2]]
        ax.yaxis.tick_left() # don't tick on the right
    
        # there's gotta be a better way to set all of these colors, but I don't
        # know that way, I only know the hard way
        [s.set_color(color) for s in spines]
        [s.set_color(color) for s in ax.yaxis.get_ticklines()]
        [s.set_visible(False) for s in ax.xaxis.get_ticklines()]
        [(s.set_color(color),s.set_size(8)) for s in ax.xaxis.get_ticklabels()]
        [(s.set_color(color),s.set_size(8)) for s in ax.yaxis.get_ticklabels()]
        ax.yaxis.grid(which='major',linestyle='-',color=color,alpha=.3)
    
    # for subplot spacing, I fiddle around using the f.subplot_tool(), then get
    # this dict by doing something like:
    #    f = plt.gcf()
    #    adjust_dict= f.subplotpars.__dict__.copy()
    #    del(adjust_dict['validate'])
    #    f.subplots_adjust(**adjust_dict)
    
    adjust_dict = {'bottom': 0.12129189716889031, 'hspace': 0.646815834767644,
     'left': 0.13732508948909858, 'right': 0.92971038073543777,
     'top': 0.91082616179001742, 'wspace': 0.084337349397590383}
    
    for col in [raised, spent]: #column to plot - money spent or money raised
        # subplots for each proposition (Fig 0 and Fig 1)
        f = plt.figure(col); f.clf(); f.dpi=100;
        for i in range(len(prop)):
            ax = plt.subplot(len(prop),1, i+1)
            ax.clear()
            p = i+14    #prop number
            for stance in [yes,no]:
                plt.bar(stance, prop[i,stance,col], color=c[stance], linewidth=0,
                        align='center', width=.1, alpha=a[stance])
                lbl = locale.currency(round(prop[i,stance,col]), symbol=True, grouping=True)
                lbl = lbl[:-3] # drop the cents, since we've rounded
                ax.text(stance, prop[i,stance,col], lbl , ha='center', size=8)
    
            ax.set_xlim(-.3,1.3)
            ax.xaxis.set_ticks([0,1])
            ax.xaxis.set_ticklabels(["Yes on %d"%p, "No on %d"%p])
    
            # put a big (but faded) "Proposition X" in the center of this subplot
            common=dict(alpha=.1, color='k', ha='center', va='center', transform = ax.transAxes)
            ax.text(0.5, .9,"Proposition", size=8, weight=600, **common)
            ax.text(0.5, .50,"%d"%p, size=50, weight=300, **common)
    
            ax.yaxis.set_major_formatter(formatter) # plugin our currency labeler
            ax.yaxis.get_major_locator()._nbins=5 # put fewer tickmarks/labels
    
            fixup_subplot(ax,color)
    
        adjust_dict.update(left=0.13732508948909858,right=0.92971038073543777)
        f.subplots_adjust( **adjust_dict)
    
        # Figure title, subtitle
        extra_args = dict(family='serif', ha='center', va='top', transform=f.transFigure)
        f.text(.5,.99,"Money %s CA Propositions"%title[col], size=12, **extra_args)
        f.text(.5,.96,"June 8th, 2010 Primary", size=9, **extra_args)
    
        #footer
        extra_args.update(va='bottom', size=6,ma='left')
        f.text(.5,0.0,footer%field[col], **extra_args)
    
        f.set_figheight(6.); f.set_figwidth(3.6); f.canvas.draw()
        f.savefig('CA-Props-June8th2010-%s-Subplots.png'%field[col])
    
        # all props on one figure (Fig 2 and Fig 3)
        f = plt.figure(col+2); f.clf()
        adjust_dict.update(left= 0.06,right=.96)
        f.subplots_adjust( **adjust_dict)
        f.set_figheight(6.)
        f.set_figwidth(7.6)
    
        extra_args = dict(family='serif', ha='center', va='top', transform=f.transFigure)
        f.text(.5,.99,"Money %s CA Propositions"%title[col], size=12, **extra_args)
        f.text(.5,.96,"June 8th, 2010 Primary", size=9, **extra_args)
    
        extra_args.update(ha='left', va='bottom', size=6,ma='left')
        f.text(adjust_dict['left'],0.0,footer%field[col], **extra_args)
    
        ax = plt.subplot(111)
        for stance in [yes,no]:
            abscissa=np.arange(0+stance*.30,4,1)
            lbl = locale.currency(round(prop[:,stance,col].sum()),True,True)
            lbl = lbl[:-3] # drop the cents, since we've rounded
            lbl = t[stance]+" Total"+ lbl.rjust(12)
            plt.bar(abscissa,prop[:,stance,col], width=.1, color=c[stance],
                    alpha=a[stance],align='center',linewidth=0, label=lbl)
            for i in range(len(prop)):
                lbl = locale.currency(round(prop[i,stance,col]), symbol=True, grouping=True)
                lbl = lbl[:-3] # drop the cents, since we've rounded
                ax.text(abscissa[i], prop[i,stance,col], lbl , ha='center',
                        size=8,rotation=00)
    
        ax.set_xlim(xmin=-.3)
        ax.xaxis.set_ticks(np.arange(.15,4,1))
        ax.xaxis.set_ticklabels(["Proposition %d"%(i+14) for i in range(4)])
        fixup_subplot(ax,color)
    
        # plt.legend(prop=dict(family='monospace',size=9)) # this makes legend tied
        # to the subplot, tie it to the figure, instead
        handles, labels = ax.get_legend_handles_labels()
        l = plt.figlegend(handles, labels,loc='lower right',prop=dict(family='monospace',size=9))
        l.get_frame().set_visible(False)
        ax.yaxis.set_major_formatter(formatter) # plugin our currency labeler
        f.canvas.draw()
        f.savefig('CA-Props-June8th2010-%s.png'%field[col])
    
    plt.show()
    
    permalink
  7. Immigration in the US, contextualized (with pictures)

    So I probably don't need to tell you this since you already know, but

    Arizona sucks!

    It turns out that even documented immigrants agree, and I have the graphs to prove it!

    You see, it all started when I took a great Visualization course this past term which was taught by Maneesh Agrawala. Maneesh gave enough structure for the assignments, but also left some aspect of each open ended. For example, our first assignment had a fixed dataset which everyone had to make a static visualization of, but the means by which we did that was entirely up to us. A lot of people used Excel (in graduate level CS class? gross!), some people wrote little programs (I wrote mine in python using matplotlib and numpy, and did some cool stuff that I will have to post about another time and contribute back to matplotlib), there was even a poor sap who did it all in Photoshop, as I recall, but anything was fair game. Turns out we could even just draw or make something by hand and turn it in!

    The second assignment, the source of my graphs which quantitatively demonstrate the suckiness of Arizona, required us to use interactive visualization software to iteratively develop a visualization by first asking a question, then making a visualization to address this question, and going back several times refine the question and make successive visualizations.

    On thing to keep in mind is that, overall, naturalized citizens are both an exclusive and a discerning lot. In most cases, you have to be a permanent resident (have a Green card) for 5 years before you can apply. And there are quotas for how many people can get a Green card every year, so there are lots of hoops to jump through. Given the amount of effort involved, wouldn't it be nice to look at a breakdown of naturalized citizens by state? Because that would give us an idea about which states immigrants percieve as, for lack of a better word, "awesome", or if you're not so generous, "least sucky". I bet you'll feel that this second description is more appropriate once you take a look at the data, but keep my "least sucky" premise in mind as you read my original write-up which focused on a different angle (but from which we can still draw some reasonable conclusions). I'll return to make a few more comments about the title of this post after the copy-pasted portion.

    here's my original write-up:

    begin cut --->

    There are three kinds of lies: lies, damned lies, and statistics.

    As an immigrant, I've always had the subjective feeling that about half of the people I'm acquainted with are either themselves immigrants, or the children of immigrants. The US prides itself in being a melting pot, a country built by immigrants, so I wanted to dive into the data that would help me understand just how large of a role immigration plays in terms of the entire country. The question I started with, for the purpose of this assignment is this:

    What's the relationship between naturalizations and births in the US?

    But what I really wanted was to find out was what kind of question do I need to ask to get the answer that would be consistent with my world view. :)

    To do this, I started with the DHS 2008 Yearbook of Immigration Statistics, which was linked from the class website.

    The file I started with was natzsuptable1d.xls, which required cleanup before I could read it into Tableau. Turns out that even though "importing" to tableau format is supposed to speed things up, it seems very fragile and would regularly fail when I tried converting type to Number (there were some non-numeric codes, like 'D' for 'Data withheld to limit disclosure). *NOT* importing to Tableua's desired format also had the added benefit of allowing me to change the .xls files externally, and having all the adjustments made in Tableau, without having to re-import the data source.

    Frustratingly, the last column and last row kept not getting loaded in Tableau! I also ran into an issue which I think had to do with the 'Unknown' country of origin and 'Unknown' state of naturalization which made the totals funky. It took a while to figure out, but there was a problem with Korea, because there was a superscript 1 by it, indicating that data from North and South Korea were combined.

    I was trying to use the freshest data possible, so I used the CDC's National Vital Statistics System report titled Births: Preliminary Data for 2007. I just had to copy paste the desired data, and massage it to fit the proper order columns in the excel table I already had handy. I put zeros for U.S. Armed Services Posts and similar territories which is probably not accurate, but this data was not available in the reports that I found. Interesting factoid: according to NVSS (CDC), in 2007 there were more people born in NYC than the rest of the state combined. (about 129K vs 126.5K). The only caveat with this data is that it contains only 98.7% of the data. The states with some missing portion of their data tabulated are Michigan (at 80.2% completeness), Georgia (86.4%), Louisiana (91.4%), Texas (99.4%), Alaska (99.7%), Nevada (99.7%), Delaware (99.9%). Thus, state-level analysis for MI, GA, and LA may be distorted.

    The data I had from DHS is for Fiscal Year 2008, which, as it turns out, goes from October 1st, 2007 - Sept 30th, 2008. Thus, no matter which combination of NVSS and DHS datasets I used, there would necessarily be a mismatch in the date range covered by each, so I settled with describing my visualization as "using the latest available data", noting the actual dates for each dataset in the captions. Also, the NVSS report contained a graph of births over time, which fluctuates very modestly from year-to-year, thus the visualization would not change qualitatively if I had 2008 birth data on hand.

    I was having a really hard time trying to get a look at the data I wanted to see in one sheet, and ended up trying to make a dashboard that combined several sheets. I couldn't figure out a good way to link the different states across datasets. I struggled for quite a while to pull out the data that I wanted to look at, and ended up having to copy past everything from DHS and NVSS (transposed) onto a new sheet in Gnumeric.

    Here's the result:

    [caption id="" align="alignnone" width="744" caption="Initial visualization"][/caption]

    So, in all of the US, about 1 in 5 new american citizens is an immigrant, or for every four births, we have one naturalization. That was kind of unsatisfying. I've lived in California the entire time I've been in the US, and I feel that at least California is more diverse than that. There's all those states in the middle of the country that few people from the rest of the world would want to immigrate to, yet the people living in them are still having babies, throwing off the numbers which would otherwise support my subjective world view...

    So I decided to look at the breakdown by state.

    Broken down by state, what's the relationship between naturalizations and births in the US?

    [caption id="" align="alignnone" width="1226" caption="my second iteration"][/caption]

    I added the reference lines so that you could both read off the approximate total easier, and be able to do proportion calculations visually, instead of mentally. This started looking promising, as I've only lived in California, and it looks like it's got quite a lot of immigrants as a portion of total new citizens.

    It was still kind of hard to see the totals, so I decide to create my very first calculated field - which would had the very simple formula [Births in 2007]+[Total Naturalized]. Using this new field, I could now make a map, to see the growth broken down geographically. This was just a way of reaffirming my earlier bias against the middle states having babies without attracting a sufficient number of immigrants to conform to my world view.

    [caption id="" align="alignnone" width="1072" caption="gratuitous map (was too easy to do using the software)"][/caption]

    In the breakdown by state bar graph, it was also difficult to visually compare the total births by state, because they all started at a different place, depending on the number of naturalizations for that state. So I decided to split the single bar and make small multiples for each state.

    [caption id="" align="alignnone" width="1278" caption="back to something more interpretable"][/caption]

    It's interesting that the contribution of naturalizations slightly changes the ordering of the growth of states. For example, Florida has fewer births than New York, yet it's total growth is larger, because it naturalized 30,000 more people than New York. With this small multiples arrangement, it was now possible to do positional comparisons across categories, not just between naturalizations and totals. Turns out that more people get naturalized in California than are born in the entire state of New York. And since New York has the third highest number of births annually, more people got naturalized in California than are born in any state other than CA and TX.

    This was too large of a graph, and the story I'm interested in is really the ratio between the birth and naturalizations (the closer to 1:1, the better), so I made another calculated field, which is exactly such a ratio, multiplied by a factor of a thousand, so I could give it a sensible description (Naturalizations per 1000 births). This refines my question

    For every 1000 people born in the US, how many many immigrants become naturalized?

    I then ordered on these ratios, and decided to filter the top states. Guam would have made the cut, but it is not a state, and (though I didn't mention it earlier) it's NVSS birth data was only 77% complete, so I excluded it. Fifteen is a nice odd number, but it actually marked a nice transition, as after Texas, everything else is less than 200 naturalizations per 1,000 births.

    The small multiples bar graphs still looked too busy, and there was redundancy in the data, which didn't tell a succinct story. So I switched to just look at the ratios alone. This revealed, that, indeed, the fact that I've been living in California makes my perspective quite unique, as it is one of three states, along with Florida and New Jersey, to have an outstandingly large number of naturalizations compared to births. It is so high, indeed, that it puts the naturalization per births rate in these three states at more than twice the national average!

    Looking at ratio alone tells us about the diversity in each states growth, but carries more meaning in the context of total growth . Thus, added the combined totals (naturalizations and births) as a size variable, for context. The alternating bands to both make it easier to read off the rows, and to aid the comparison of sizes by framing every data point in a common reference window. It obviates that California is the state with 864,261 new citizens because fills the frame completely.

    Final question: What are the Top 15 "Melting Pot" States?

    [caption id="" align="alignnone" width="1095" caption="almost done, would be nice to include context from the visualization I started with"][/caption]

    Ordering the data in this way also shed light on the small but still very diverse states that would not have otherwise made the cut (and did not pop out in any manner on my previous bar graphs). Rhode Island and Hawaii got it going on, in terms of attracting immigrants.

    Certainly the fact that I'm an immigrant myself also greatly influences whom I associate with, further skewing my world view towards a 1:1 ratio, but I'm actually quite impressed with just how close to that ratio is in California - 1:1.9. Of course, the data I've analyzed does not include the American-born 1st generation of children, nor does it take into account the number of immigrants living in the US that do not have citizenship. All of these factors would surely push the ratio even closer toward 1:1.

    I decided to combine the US total growth information, since it's gives further perspective on the entire data set, such as the fact that California accounts for about 16% of total US growth. It also sheds light on how the US average was calculated. A new "twice the nat'l avg" line makes explicit the three most diverse outlier states mentioned before. I also changed the colors to match the convention used in the bar charts made earlier. The US combined total line semantically links the data plotted with the national growth bar chart - i.e. the green dots are formed by the sum of born and naturalized citizens.

    [caption id="" align="aligncenter" width="1259" caption="What are the Top 15 "Melting Pot" States?"][/caption]

    <---- end of cut

    Ok, so, to be honest, it turns out that I wrote a large chunk of this post (Arizona suckage included) before I actually looked back at my visualizations, only going off my memory that it wasn't in the top 10. So Arizona is just below the national average in this "Melting Pot" ratio (a measure I made up, the number of naturalization per 1000 births). Since it is #12, some might say, "Paul, Arizona's on your top 15 list", to which I'll reply: "So's Texas."

    I guess I just wanted to share these purdy graphs I made a few months back, and it seemed like there was a somewhat topical angle on them a few weeks back, when I remembered that I hadn't posted them on here yet. Anyway, I'd love to hear back your thoughts.

    permalink
  8. on facebook, you die a quiet death...

    2010 02 27 technology

    So I finally got around to quitting facebook. I came to the HackerDojo tonight, and Waleed was working on sending out notifications for users of his app, because Facebook is disabling notifications as of March 1st.

    Notifications was the way I was planning to announce my departure, because the "app" I created for that purpose would still be around after I deleted my account. But I was only able to send out a few of these notifications before facebook would not let me anymore.

    Here's the message (with the cool icon I made) as it appeared on Facebook:Notification.

    But that wasn't going to stop me: so I made a status update - which contained essentially the same message:

    Hey guys, as I alluded to in a status update a few weeks back:
    
    I'm leaving Facebook.
    
    This walled garden isn't the way the web was meant to be, and I *refuse* to
    continue participating in it.
    
    You can always find me by either searching for "Paul Ivanov", or through the
    various means listed [on my website](//pirsquared.org/personal.html).
    
    best, 
    -pi
    

    And then I was off to "Deactivate" myself. Beware - Facebook will play on your heartstrings by telling you that a couple of your friends will miss you - putting their photos right there and beckoning you to message them - but I was prepared to deflect this Faustian bargain. "Don't go! Ashley and Andrew will miss
you!!!" Though after I said "yes, really", it asked me for my password, and following that also sent a CAPTCHA! ("the escapade" indeed!)

    "the escapade"
indeed

    and then, just like that: POOF! I vanished. Without any trace. The fake temporary account I created had no friends - my last (and all) status updates lost in the ether (at least as far as facebook users are concerned). But rest assured that the data lingers - permenantly - as facebook was immediately suggesting that my new fake account, who now had no friends to speak of - connect with the people I had been friends with, even though my account was gone.

    For most of my friends, I got your birthdays, schools and jobs, as well as the most recent photo, and all of the friend-to-friend connection among you. For most - I actually had a stale copy (of everything but the photos) from 2007, which when I originally wrote the export scripts I used, and Facebook was just starting to open up its API and quickly devolving to a MySpace clone. It's not like I even frequently used the site - but reading about this reminded me to quit for good.

    So I finally did.

    permalink
  9. Publisher's Block

    2009 12 26 technology

    vim

    One of the reasons I find it so difficult to get more than a couple of entries in per year, is that I know they aren't going anywhere after I post them. They're sticking around for a while, and if they're full of trivial crap then that doesn't reflect very well on me. Posting about trivial stuff was ok when I was still trying to establish a sense of identity. These days, when I write something public, say on a mailinglist, I agonize over every detail because I know that this digital breadcrumb with my name attached will be around forever. So I keep raising the stakes to myself, neurotically checking over every possible extra whitespace in a patch I send in, sinking hours into something that should have taken 15 minutes.

    I'm finally getting to the point where I realize it's a problem that, for example, even when I'm texting someone, I try to get all of the spelling and punctuation correct.

    It's slowing me down.

    I've had a lot of half-written blog posts that, after stepping away from them for a short while just don't seem significant enough. I try to only publish pieces that either I think about for a while, or that I'm not hearing/reading others write about. But I'm always mindful about adding noise. The way I see it, when it became super easy for anyone to publish online, a lot of content flooded in that I simply don't care for. Same idea with web 2.0 - because of Ruby on Rails, Django, and other web frameworks, writing a fancy (but useless) website became super easy - and now we're oversaturated with them ((Though this problem will probably sort itself out with time. I didn't intent to write about this now, so I'll just keep that remark without developing it further)). So there's this internal tension: I think there's too much crap-content out there but at the same time my internal filter keeps me from publishing anything. I rarely express my thoughts about what I find important in writing anymore. Others don't seem to make such a big deal about self-filtering, and are much more prolific writers/bloggers/coders, etc.

    LTS

    So here's a new acronym-sized motto to help correct this behavior, which is starting to get sprinkled in comments in the software I'm writing for my research: LTS. Life's too short.

    LTS

    I use it as a reminder of what in the past was one of my frequently used maxims: most things in life are pass or fail. This doesn't mean that it's ok to do a half-assed job on everything, but given that there's a limited amount of time, I should focus my efforts only on that which is truly important. Typos in a text message or extra trailing whitespace do not qualify as such.

    I wasn't always this careful about what I publish. I've had some form of internet presence (as embarrassing as it may seem now) since I was in middle school. It started in one of those geocities neighborhoods, I don't even remember any details right now, probably because my brothers helped me to set it up. I didn't use my real name until I started a poetry website freshman year in high school.

    I used my full name, because I wanted to express my thoughts and have them be connect back to my persona, not a pseudonym that I might grow tired of. I was quite explicit about this at the time. And I didn't filter myself, I just counted a total of 20 poems on there which were written in the course of a year. None of them really make me cringe, and some I'm still quite proud of.

    I had nothing to gain by hiding behind an alias. I think that attaching my real name somehow made my thoughts sincere. I started blogging socially my senior year in high school (livejournal), and looking back on the first entry there, I was just trying to capture day-to-day events and thoughts. Vim, THE editor, is mentioned five times in the first two entries :) . But there are some very candid and thoughtful remarks in there, too.

    It's kind funny to have your more than 10 year old website cited in a Yahoo! Answer to the question: "What is the best way to live life to the fullest?". Basement cited I mean, it is yahoo answers, we're really scraping the bottom of the barrel when it comes to content (( in fact, Elaine absolutely refuses to read anything on that site anymore, despite the fact that frequently, her google search string is verbatim the same as the question which comes up as one of the top results)) , but it's still cool. Yeah, ok, so it's doubly embarrassing because the citation is just for the lyrics to "The Sunscreen Song". I'm ok with that.

    And I'm very grateful for my many friends and colleagues who, by their example, continue to give me the courage to release my thoughts and code out in the open. Thank you.

    As I was putting my finishing touches on this post, I found a recent entry on Scot Hacker's blog titled "(I Don’t Care About) Facebook and Privacy" that covers similar ground: "For me, it’s simple: If what you have to say shouldn’t be said to the whole world, then don’t say it online." I agree, and it's a more sensible standard than my "everything you say will forever be connected to you, so don't screw it up!" But just to be clear, this should only apply to things you intend to write up and release: I absolutely oppose Eric Schmidt's dismissal of privacy. Eric says, "If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place." Due to its construction, it bears striking similarity to Scot's quote above with which I mostly agree. But to me, Eric's statement is a 1984-sized world apart.

    Anyway, hopefully I've adequately explained my "publisher's block", and there are many related topics left to explore, but this is where I'll have to end this post for now. LTS.

    permalink
  10. Standing up to the Madness is an excellent read

    2009 05 02 books

    Standing up to the Madness: Ordinary Heroes in Extraordinary
TimesMy labmate Tim sent me an email on Wednesday (April 15th) saying that Amy Goodman "Democracy Now! fame, and my heroin" [sic] was speaking on campus at noon. The place was packed, and it's the best way I could have imagined to snap back out of the Qualifying Exam bubble I've spent the last several months in, and re-engage with the world at large.

    One of the excuses for the tour is the paperback release of Standing up to the Madness: _ Ordinary Heroes in Extraordinary Times_ by Amy and David Goodman.

    Now that I'm a tenured grad student, I can actually allow myself to read for pleasure - guilt free! So I went to the library that Thursday, and picked up the hardcover, which came out last year.

    What I liked about this book is what sets it apart from other political books of today. Amy and David don't just provide us with a laundry list of wrongdoing by the Bush administration, congress, various governmental agencies, as well as highlighting some of the ongoing local struggles. Though the book is chock-full of such details, they are all provided in the context of a particular vignette. What's more - instead of simply stating the problems, or providing an outline of the authors' opinions regarding what course of action should be taken, the book highlights the work average citizens have already done to oppose injustice, censorship, racism, etc. One example is T-shirt "terrorist" Raed Jarrar, who wore a shirt with the words "We will not be silent" - written in both English and Arabic - a reference to the White Rose - and was forced to put another shirt over it because JetBlue customers were threatened or offended. With the help of the ACLU, Jarrar sued the TSA and JetBlue, who ended up paying $240,000 to settle the discrimination charges.

    Like Hochschild's King Leopold's Ghost ((which, after I first read it in 2001 became my measuring stick for gauging the quality of non-fiction)), this book is non-fiction that reads like fiction. Not because it is well-written, though it is, but because of the shocking realities of the content. Leadership cannot be taught, it can only be revealed. Standing up to the Madness gives us dozens of snapshots of the ongoing work of ordinary heroes.

    permalink

« Page 4 / 6 »