Skip to content
Published on

Artificial Intelligence and Open Source

Categorized
Open Source

Artificial Intelligence (AI) is a focal domain for developers, for end-users and for the venture capital community.  It’s as hot a commodity as Linux and open source were two decades ago.  But AI and open source share more than just hype.  Across natural language processing (NLP), Machine Learning (ML), Computer Vision, and Robotics, both AI and open source drive the democratization of technology, and open source is helping to drive the utility and ubiquity of AI platforms and applications.

Why Open Source

Developers leverage AI by accessing cloud platforms via open APIs, and by using and deploying  tools for application development and platform management.  Many (if not most) of these tools and models are made available as open source software.  Here a few of the more popular projects:


ProjectDescription/MottoRepoLicense
H20.aiDemocratizing Generative AIDownloadAPL 2.0
KerasDeep learning APIGitHubAPL 2.0
MistralOpen LLMGitHubAPL 2.0
OpenCVComputer VisionGitHubAPL 2.0
OpenPilotDriver AssistanceGitHubMIT, others
PyTorchPython ML libraryGitHubMultiple Copyrights
RasaConversational AIGitHubAPL 2.0
Tensor FlowProduction-grade MLGitHubAPL 2.0

Development tools, training engines, dashboards, and API implementations are all ideal fodder for open source publication.  Access to training, curation and query algorithms gives users assurances of future viability, visibility into project roadmap, and the opportunity to contribute bug reports, feature requests and even new functionality.

While these tools and capabilities are surely differentiating, especially across application areas, they only indirectly represent the core value-added of each AI platform. That value lies in the breadth, focus and quality of the training data.

When is open source AI useful?

So why publish AI software as open source?  The rationale is not so very different from other technology domains considering an open approach.  AI projects look to open source when

  • Real-world customers require access to source code, to ensure future access

  • A nascent project wants a broader user base to exercise and test the code or to enable key aspects of it (e.g., for hardware support and interoperability)

  • Users / developers want to target a different domain, e.g., using a drone code base for IoT

  • Multiple organizations want to ease collaboration, create an ecosystem or establish a de factor standard around a code base

And yes, sometimes project founders also favor free and open source software as a matter of principle.

When is open source AI less useful?

The current crop of AI platforms mostly run as extremely large SaaS (Software-as-a-Service) entities. Large Language Models and APIs associated with them are hosted and execute in the Cloud and occupy vast storage and runtime memory footprints, beyond the means of all but the largest enterprise resources and IT budgets. As such, while source code may prove interesting to inspect, building and running it can fall beyond the means of interested parties.

Critics of open source AI worry that the openness of the code base offers a larger attack surface for malicious actors, as is the case with open source in general.  But insisting on a closed approach (like Google ‘s original Lambda project), excludes contributions from independent researchers and the public in general.

Example of AI and Intelligent Devices - OpenPilot

OpenPilot is a open source driver assistance system that operates across 250+ car models, including Toyota, Hyundai, Honda and other popular brands.   It offers automated lane centering, adaptive cruise control, lane change assist, driver monitoring and long-duration self-driving, among other functions, all running on vehicle-local embedded hardware.

OpenPilot was developed by hardware manufacturer Comma, who builds after-market car control hardware that plugs into available connections (CAN and CAN-FD). Comma released OpenPilot as open source software with several important goals:

Comma One Device
  • Expand the OpenPilot training base across more users driving more types of vehicles

  • Leverage a community of end users to expand the base of support across more vehicles and vehicle interfaces

  • Test its wares in the field (the goal of many open source projects)

  • Avoid regulatory and safety requirements by not delivering off-the-shelf products and solutions

That last goal is an interesting one.  Vehicles with ADAS and full self-driving capability produced by the big commercial automakers are subject to stringent oversight by FMVSS, NHTSA and other industry groups and government agencies.  To skirt this requirement, Comma declares

Any user of this software shall indemnify and hold harmless Comma.ai, Inc. and its directors, officers, employees, agents, stockholders, affiliates, subcontractors and customers from and against all allegations, claims, actions, suits, demands, damages, liabilities, obligations, losses, settlements, judgments, costs and expenses (including without limitation attorneys’ fees and costs) which arise out of, relate to or result from any use of this software by user.

THIS IS ALPHA QUALITY SOFTWARE FOR RESEARCH PURPOSES ONLY. THIS IS NOT A PRODUCT. YOU ARE RESPONSIBLE FOR COMPLYING WITH LOCAL LAWS AND REGULATIONS. NO WARRANTY EXPRESSED OR IMPLIED.

It is common for open source licenses to include such a disclaimer (no warranties, etc.).  Moreover, with the end-user also being the integrator, Comma faces a further reduced risk of litigation.

By licensing the OpenPilot training tools and other utilities under liberal MIT and related licenses, Comma enabled a broad swath of individual users and organization to use and even redistribute much this AI-centered project.  The author has encountered other organizations who are exploring special use cases for the OpenPilot platform, such as supplying kits to enable older drivers to continue to drive while being monitored (a form of eldercare), and applications for disabled people to be more autonomous (augmentative technology).

The OpenPilot base is more compact than the data sets underlying ChatGPT, and is designed to reside fully on mobile devices (vs. more massive cloud-based training sets).  However, like creators of other AI offerings, Comma is more restrictive in licensing its machine training base – it’s not open source and its use is tied to the purchase of Comma hardware.

Conclusion

AI has applications that span the gamut of human endeavor, from content creation to medical diagnosis to scientific research to robotics to artistic creation to national defense to test automation and beyond. Offering key components and even whole platforms as open source software has the potential to make AI applications even more ubiquitous and easier to repurpose, integrate and deploy. But given that the greatest value to an AI platform derives from training via machine learning, open source AI is not a “free beer” panacea: extracting value from AI still requires substantial investments in integration effort, training and often commercial licensing of the “intelligence” that drives it.