Current Challenges in Free Software and Open Source Development

Paul Ivanov
https://pirsquared.org/talks/badcamp

Bay Area Drupal Camp 2024



( video of this talk ) ------- >

# whoami
pi
Long time Scientific Python ecosystem participant:
  • Matplotlib
  • IPython
  • Jupyter
Previously worked at:
  • UC Berkeley
  • Disqus, Noteable
  • Bloomberg, Citadel

Currently working on spines.dev
(Jupyter Notebook Search)

Contributors

Itay Dafna

Matthew Turk

Juan Nunez-Iglesias

David Nicholson

Carol Willing

# whoami
pi

1984 - same year as :

  • the Macintosh

challenges

  • Training and using code assistants for license laundering
    • LLMs as MLMs
  • Mixing paid and volunteer labor
    • Oil and water
  • Relicensing of Open Source Software
    • Silver spoons and forks

License Laundering: The Pyramid Scheme(s)

  • normalizing plagiarism

  • If the schemers can corrupt enough other parties,
    corruption won't be seen as a problem

  • Where did the code come from?

    we don't know and we don't care

The state of open source (GitHub, 2023)
  • Speaking of generative AI, almost a third of open source projects with at least one star have a maintainer who is using GitHub Copilot.

  • Open source maintainers are adopting generative AI. Almost a third of open source projects with at least one star have a maintainer who is using GitHub Copilot. This follows our program to offer GitHub Copilot for free to open source maintainers and shows the growing adoption of generative AI in open source.

We now have over 1.3 million paid GitHub Copilot subscribers, up 30 percent quarter-over-quarter, and more than 50,000 organizations use GitHub Copilot Business to supercharge the productivity of their developers

Copilot accounted for over 40% of GitHub's revenue growth this year and is already a larger business than all of GitHub was when we acquired it.

GitHubCopilot customer agreement

2. Ownership of Suggestions and Your Code. GitHub does not own Suggestions. You retain ownership of Your Code.
3. Responsibility for Your Code. You retain all responsibility for Your Code, including Suggestions you include in Your Code or reference to develop Your Code. It is entirely your decision whether to use Suggestions generated by GitHub Copilot. If you use Suggestions, GitHub strongly recommends that you have reasonable policies and practices in place designed to prevent the use of a Suggestion in a way that may violate the rights of others
Secondary Pyramid Scheme

March 2024 Communications of the ACM Article: Measuring GitHub Copilot's Impact on Productivity

What is the single most successful project?

  • Linux kernel

what license does it use?

  • GPLv2

  • which Drupal also uses

BigCode is an open scientific collaboration working on the responsible development and use of large language models for code

The BigCode project trained StarCoder code generation LLM using The Stack.

The Stack v2 dataset is a collection of source code in over 600 programming languages.

democratise the software creation process

Are you in the Stack?


There's an opt-out form

https://github.com/bigcode-project/opt-out-v2/issues

It's a general characteristic of social systems that people have to have confidence in the future... If property rights are provisional then your value of property changes accordingly

Gabe Newell: "On Productivity, Economics, Political Institutions & The Future of Corporations: Reflections of a Video Game Maker" https://www.youtube.com/watch?v=t8QEOBgLBQU. (2013)

A License to Kill The Community

Tearing up the social contract we operated under
for the past 40 years. The erosion loop has begun.

we will become part of the sediment

Well, that's just like, your opinion, man.

Basically, I switched from busybox to toybox because of licensing. I try not to care about licensing anymore. I ran the experiment: copyleft did not help add code to busybox.

Toybox is licensed under Zero-Clause BSD license.

(No attribution necessary)

SCO's monkey trial

SCO made it clear that, in its opinion, Linux was stolen property: "It is not possible for Linux to rapidly reach UNIX performance standards for complete enterprise functionality without the misappropriation of UNIX code, methods or concepts".

Microsoft, which had not yet learned to love Linux, funded SCO and loudly bought licenses from the company. Magazines like Forbes were warning the "Linux-loving crunchies in the open-source movement" that they "should wake up". SCO was suggesting a license fee of $1,399 — per-CPU — to run Linux.

Paths forward

Provide data sets of and train only on MIT-0 and Zero-Clause BSD, and similar effectively public domain, no attribution licenses

This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.

Mixing Paid

    and

Volunteer Labor

Mixing Paid (oil)

    and

Volunteer Labor (water)

Keith Packard

A Political History of X: or How I Stopped Worrying and Learned to Love the GPL (LCA 2020)

MIT X consortium

Get everyone involved – Well, at least every workstation vendor willing to write big checks

  • Would answer email from the people paying him first

Spectrum of volunteer to paid..

student, grad student, academic, hobbyist, professional

  • taking a lower-paying job to continue doing this thing which I value

  • academic vs industry vs hobbyist started and oriented projects

other axes:

  • self <---> society

  • journey <---> destination

  • Athenian <---> Spartan

  • dogmatic <---> pragmatic

  • silly <---> serious

  • courteous <---> abrasive

  • pushing <---> steering

  • emotional <---> intellectual

If the pyramids had a commit log,
they could have been built by volunteers.

@ivanov
  • Thought experiment: Would you pitch in as a volunteer to help build the pyramids knowing that others are paid for their labor?

The tension is always there -

  • the only time it isn't is when the projects is hobbyist only (of no commercial interest) or purely commercial (where the hobbyists have been squeezed out)

Eric von Hippel: User innovation

Users innovate for their own use

Producers have to worry about the size of the market

2024 Tidelift State of the Open Source Maintainer Report

  • - size of user base
  • - height of entitlement

  • - size of user base
  • - height of entitlement
  • - volume of outrage

Silver Spoons

Lead to

Later Forks

why are they doing that?

  • to make money
  • fear of hyperscalers
  • insufficient ROI from failure to foster contributors

what are users doing in response?

  • largely forking

Philanthropy

  • that's how free and open source software should be seen
  • we're all still figuring it out, and most people are trying their best.

other disruptive changes

  • left-pad incident (2016): or more appropriated kik incident. NPM sides with a company, developer takes his stuff and leaves.

Not driven by logic, anger, or greed. It was a decision guided by my heart. And it came from a simple principle: if NPM breaks its own rules to remove one of my packages, they should remove all of them.

-- Azer Koçulu (June 2024)
  • ongoing Wordpress saga (2024-)

The Rage Pyramid

  • - size of user base
  • - height of entitlement
  • - volume of outrage
Current Challenges in Free Software and Open Source Development
Paul Ivanov, BADCAMP 2024
Current Challenges in Free Software and Open Source Development
  • Attribution theft: demand ethically-trained code assistants (MIT-0, 0BSD, Public Domain); refuse to use others
Paul Ivanov, BADCAMP 2024
Current Challenges in Free Software and Open Source Development
  • Attribution theft: demand ethically-trained code assistants (MIT-0, 0BSD, Public Domain); refuse to use others
  • Tension between paid and volunteer work will emerge in all popular open source projects
Paul Ivanov, BADCAMP 2024
Current Challenges in Free Software and Open Source Development
  • Attribution theft: demand ethically-trained code assistants (MIT-0, 0BSD, Public Domain); refuse to use others
  • Tension between paid and volunteer work will emerge in all popular open source projects
  • Forking is a right, adjust expectations accordingly
Paul Ivanov, BADCAMP 2024

-- _paginate: skip

# Current challenges in free software and open source development

### A talk in Three Pyramids

Thank you for coming to this talk A little about you: developers? community managers? writers? designers? Make reference to the competing sessions <b> does it work to do bold here</b> I wonder *damn* that's too bad, seems like that'd be easy to do

__footer: "Paul Ivanov, BADCAMP 2024"

running Stoic Hedgehog LLC

this is my ACM Software System Award. ACM is the Association for Computing Machinery, *the* professional organization for programmers and computer scientists. Founded in 1947 - 77 years ago. The award is a glass trapazoid sitting on its side. The books in the background are foreshadowing... chilly emeoji - the spice must flow..

I want to thank some people who helped me think through some of the things in this talk

But enough about them, let's get back to talking about me... TODO: Challenges in Open Source reading and watchlist

Steven Levy's ("Leeeevy") book <u>Hackers: Heroes of the Computer Revolution</u>

> It was at this conference that Richard Stallman first publicly and explicitly stated the idea that all software should be free, and makes it clear that “free” refers to freedom, not price, by saying that software should be freely accessible to everyone. This was probably the first time he made that distinction to the public.

-hackers - wizards of the electronics age

if you go back 40 years before I was born - there were NO computers - ENIAC was completed in 1945 Harvard Mark I in 1944 FORTRAN - the oldest programming language still with us today - finished its specification in - 1954 - only 70 years ago (took another 3 to get a compiler) COBOL finished in 1959, LISP ni 1960 ..give people incentive https://web.archive.org/web/20161107235202/https://www.gettyimages.com/detail/video/at-the-first-hackers-conference-in-1984-richard-stallman-news-footage/146485179 there'd be no cloud computing without open source and free software Linux kernel developer and maintainer Greg Kroah-Hartman said in a talk that Debian runs 70% of thhe world (cloud vendors) - 80%+ run non-commercial distributions "redhat, suse, ubuntu are great distros" LAMP stack https://www.youtube.com/watch?v=at-uDXbX-18&t=1850s

## topics of concern

GitHub Copilot, and other large language models for code completion trained on publicly available software with no regard for the licenses of that software, acts as a license laundering cudgel that denigrates the work of open source developers who have contributed their code under a legal framework for how their code can be used. If their code was contributed under a copyleft license, like GPL (v2 or v3), the expectation is that no user of their code will ever lose the ability to modify their code. If it was contributed under a permissive license, like BSD (2 or 3 clause) or MIT, the expectation was that inclusion of their code would result in an attribution to the project they contributed to. Tools like GitHub Copilot makes all of those expectation of the original authors null and void, enabling theft of their work, all in the name of making it "easier" for other programmers.

I'm not going to go all coffezilla here I called this a pyramid scheme in the abstract for this talk, and here's what I mean It's also a heist - on the scale of stealing everything in the Louvre - talk about value capture... <-- what do we need for a pyramid: we need some way to entice newcomers to join and we need someone to be benefitting at the top let's start with the base, and work our way up

If enough other people participate in the plagiarism, we can expand the uni

Those who start the earliest get

diffusion of repsonsibility:

let's take a look at how the corruption pyramid looks like at its base

don't fly solo - try gihub for free - the evil empire telling you to come to the dark side, we have cookies

Admiral Ackbar - It's a trap

let's check in on how the corruption is going --

* > In 2023, developers made <b>301 million total contributions to open source projects across GitHub</b>

that ranged from popular projects like Mastodon to generative AI projects like Stable Diffusion, and LangChain.

Commercially backed projects continued to attract some of the most open source contributions—but 2023 was the first year that generative AI projects also entered the top 10 most popular projects across GitHub.

five sentences later, they feel the need to REPEAT this claim

increase in fires follows our program to hand out lighters to every teenager, squirrel, and racoon

I am not going to convince them - the "makers" of these tools but there's a different way to live.

Q2 end of January, q4 end of July

MSFT aquired github 6 years ago in 2018, for $7.5 billion - which then had 31 million developers

If you license either Copilot Business or Copilot Enterprise directly from GitHub, then this document applies. So they get users' money, but users get the potential infringement suits.

I'm not stealing other peoples work without giving them credit - I'm just making a suggestion

secondary pyramid scheme

primary pyramid scheme of attribution, secondary is skill aquisition

Joe Armstrong - one of the creators of Erlang

I asser that the primary beneficiaries of copilot are novice programmers in other words - instead of teaching people how to fish, copilot and tools like it are selling fish and indeed there has been some evidence coming out to say this is the case: in one study three groups were assigned a programming task. one had access to chat-based code assistant, one had ai based completion, and noe was just allowed to use a search engine. The group using the chat finished first, but struggled to explain it. Those using only a search engine took the longest, but could compentently explain and manipulate their code afterwards

more like impact on erosion of professional ethics - github copilot's integrity?

We are sitting here -- I'm supposed to be an ethical professional, and we're talking about productivity. I mean - listen: We talking about productivity? Not ethics. Not ethics. Not ethics. We're talking about productivity. Not ethics, that we go out there and project ot the world.

They posted a video and I was fuming -- So I did something about it - I was going to leave a comment on a video

there are two vowels that are very hot right now, and I prefer to replace them with two consonants: B and S... This company has done the opposite of what it set out to do, so I refere to it as Closed-BS

Speaking of MIT license... let's go to the first project to use it

let's get into the weeds a bit

importantly, all of the example licenses listed here - the most popular ones, require attribution

GPL: allows incorporating permissive licenses but encumbers author to continue to publish the combined work under the GPL

(since 1991 linux had it's 33rd birthday recently

though via rob landley i found out 'linux was "no commercial use" until after 0.12 release' http://landley.net/talks/ohio-2013.txt - 0.95 release in 1992 https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?h=0.99&id=e6c7a63f3cc9898b82d65ac3bda90d543a471c17

you can see the problem here - once something is GPL, its derrivative works should also be GPL? Did github copilot exclude GPL code during training of their code assistant? I don't knoww - but I know that the folks from BigCode project *did* just that...

bigcode is supported by hugginface and servicenow research

the stack v2 has 32 Terabytes of code

let's look in aggregate at the kinds of liceense found in the stack

some things they did right: left out GPL code up front

what's wrong with that? -- there are other conditions outside of the kind of work derrivative works may be combined with, namely all of those licenses stipulate ATTRIBUTION for the original authors of the work

have they considered that there's code on github that predates github?

but what the hell, man? if only there was some way for authors to indicate how they would like their code contributions to be used in the future

we can come back and check at the end of the talk

https://youtu.be/t8QEOBgLBQU?t=3140

Social contracts sustain our institutions. tragedy of the commons? OSS-tensible chalenges FOSStering future developers?

erasing licenses is a kind of license

are technology companies cannibalize their mycilium roots and the value they benefitted from? Good job, we were giving all of this stuff away without cost and you managed to pirate it anyway

how much time left to sterility?

Who's squeezed between Sean Connery and Roger Moore? It's George Lazenby via Matt - Peter sellers is missing

Some prominent developers no longer maintainer of busybox and creator of 0-bsd

We tried, we enforced it, and then when I had run the experiment and proven the negative I couldn't get them to stop

companies throw out GPLv2 also, Apple pushes LLVM

SCO's monkey trial: 2003

It was the claim of access to Unix code that was the most threatening allegation for the Linux community.

To rectify this "misappropriation", SCO was asking for a judgment of at least $1 billion, later increased to $5 billion. As the suit dragged on, SCO also started suing Linux users as it tried to collect a tax for use of the system.

toybox is zero-clause bsd, sqlite is public domain

Gentoo Linux put forth a policy in March of this year

> It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools.

NetBSD created a similar change in May

#### Matthew Butterick #### #### GitHub Copilot #### Litigation

#### <div style=""><a href="https://githubcopilotlitigation.com"><wbr>github<wbr>copilot<wbr>litigation<wbr>.com</a></div>

# Oil and water: the effort and expectation differences of mixing paid and volunteer work. Popular collaboratively developed software can attract a mixture of monetarily compensated development as well as the hobbyist unpaid labor that nourished the software in its nascency. "Business critical gift economies" don't exist, but projects and their participants vary on a spectrum from those two opposite poles, and positions shift over time. Paradoxically, this tension can be both productive and detrimental at the same time, depending on the perspective of those involved. let's make it a bit darker, more conspiratorial

oil can take a lot more heat -- water is abundant, but won't make you french fries paid - dirty money... clean pristine water... yeah, i'm a hippie

keith worked on X window system, fontconfig, cairo

Why? So here's that story - it's 1988 note the contrast of the license preferences here: whereas rob landley, who had been a fan of the GPL ends up switching to BSD so kk keith packard says GPL was the better choice, it would have force competitors to work together, instead of going off and doing their own thing. keith's slides 12, 13, 14 - X as Corporate Tool ● Jim Gettys and Smokey Wallace ● Write X11, release under liberal terms ● Displace SunView (then dominant proprietry display protocol) ● “Reset the market” ● Digital management bought this plan --- - MIT X Consortium ● Hired dev team at MIT ● Funded by Consortium members ● Members also voted on standards --- Where Was The GPL? ● We knew Richard too well – The GPL's worst sponsor ● Corporate sponsors dedicated to non-free software – Pay for say turns out to have powerful control slides 28 and 29

why are you talking about this? What's old is new again

fame can be the goal, self-promotion are they doing something to fill their heart, or their wallet? both? related to the technology that insists on monopolizing your attention from the previous talk

your reasons not their reasons

journey: experience - destination: end result "product"

you vs others

experience vs end result

egalitarianism vs hierarchy

and hopefully sane vs automattic

team vs product

openbsd and 9front good examples of abrasive

work vs diversion from work

![](images/pyramid_tweet.png)

go into personal history of jupyter project

pyramids were built largely by paid laborers, not slave labor (as asserted by Herodot-us) "Graves of the builders who worked on the Pyramids of Giza were first found in 1990." -

how do you balance paid and volunteer work? Both as a paid developer, and as a project overall. (3a) if you identify a great volunteer and start paying them for their work, when/if the money runs out, you can see how they might actually stop both paid and volunteer work. (3b) within a project, other volunteers might feel like hey, that dude is getting paid, why would I do work for free? money can be a deterrent for volunteers to contribute to your project, because projects appears well resourced

don <!-- - a focus on money from businesses that benefit from open source - this talk is complementary - can you throw money at education? poverty? drug abuse? violence?

I don't have a smart phone, don't shop at big river company, don't use eff book

if a community has hobbyists

Jupyter Foundation - a case of the hobbyists being squeezed out - or an affirmation of this Matt points out that " given copilot's voracious maw, seems like it's now a bit harder to say [when a project is hobbyist only"], since the usage of hobbyist code in commercial products is basically unknowable" i might start off as a hobbyst, do something clever, and get a job offer out of it, or some financial support, or a grant.... also, the state of "purely commercial" open source isn't great, because innovation happens from individuals, not from organizations

two approaches: making platforms that easier to commercialize

make a commons: diffusion mechanism

(via Carol Willing)

skateboard example - Back to the Future

user innovation - not producer innovation had to change definition of innovation it can be lonely - but some love to tinker

Burnout galore -- no talk about open source would be complete without mentioning burnout recall that property of oil to hold heat money can be salve for when the temperature gets cranked up -- end up in a Dune world - all the water will evaporate

### one final pyramid - what's this?

# The Formula

formula for the volume of

* $V$ - volume of outrage

### the third and final pyramid - what's this?

formula for the volume of

# Pay the piper: relicensing trends in commercially supported source-available projects. With increasing frequency, previously open source software projects backed by a commercial entity are shifting away from traditional licenses. Recent examples include Sentry, Terraform, Redis, and CockroachDB. Why are they doing that and what are affected users and developers doing in response.

there's a lot to dig into here, just depends on how much popcorn you have

Redis - Valkey, Hashicorp -> OpenTofu, CockroachDB

hashicorp bought by IBM for $6.4B

the fork was already done by the BUSL or other "source available license"

time for self flagellation

you can't be mad if a donor you relied on no longer gives to your cause

well, you can get mad about anything, but that's a Hulk other problem

<p>much like you shouldn't begruge a past-time top contributor not being around anymore because they wish to spend their time elsewhere, you also shouldn't begruge a business that is choosing to spend its money elsewhere </p>

(not me, obviously, but most *other* people)

> Fair Source is an alternative to closed source, allowing you to safely share access to your core products.

#### Sometimes, they change their mind back - elastic search

Left-pad was like a "death" and "re-birth" moment for me. The part of me passionate about open-source was dead, and something new took over. Now, I'm passionate about business, marketing, running companies / teams in different ways, as much as I'm about programming.

is BitWarden next?

Let's recap : CONCLUSION

rage still applies here: previously used this in defense of maintainers, this time worth mentioning it as an advisory to commercially supported open source projects

formula for the volume of

I've tried to give you a flavor of the kinds of challenges that exist in developing software out in the open.

the big alarm bell

would you continue to buy garments if you kjnew they were made with child or forced labor?

the big alarm bell

just like business cycles have booms and busts, so too will free and open source projects

though you think they're wrong.

though you think they're wrong.

though you think they're wrong.

This 1st challege is one of education and social norms

Many of my friends were looking forward to this talk. indeed follows ups about bitwarden