Exploring Patent Networks Using U.S. Patent and Trademark Office Patent Records

Project Summary

This data expedition explores the local (ego) patent citation networks of three hybrid vehicle-related patents. The concept of patent citations and technological development is a core theme in innovation and entrepreneurship, and the purpose of these network explorations is to both quantitatively and visually assess how innovations are connected and what these connections mean for the focal innovations and the technologies that draw on those patents in the future. The expedition was incorporated as part of the Sociology of Entrepreneurship class, where students are thinking about the emergence and diffusion of innovations.

Themes and Categories
Year
2018

Graduate Students: Josh Bruce (joshua.bruce@duke.edu) and Molly Copeland (molly.copeland@duke.edu)

Faculty: Dr. Martin Ruef

Course: Sociology of Entrepreneurship (SOCIOL/MMS 359)

Guiding Questions

The main research question students answered with this expedition was: What are the characteristics of innovation networks? To address this, how can we conceptualize and visualize patent citations as evidence of innovation network structures? Then, how can we characterize features of innovation ego-networks in patents with common ego-network measures (such as size or density)?

To answer these questions, groups of students worked with different patent ego-networks in the data that are relevant to broader topics they cover in the class and class projects. Students adapted visualizations to reflect characteristics and attributes determined to be of interest within the data (e.g., patent technology classes), chose the descriptive measures that provide insight into broader questions in other innovation projects in the class, and groups of students compared descriptive measures across networks using common network measures. Additionally, after constructing the ego-networks, students brainstormed potential pitfalls of conducting and reporting descriptive network analysis (e.g., consistency in visualization, comparing across networks, interpreting network measures, coding and R-specific issues that commonly arise).

The specific techniques and teaching goals include:

  • Introduction to R and RStudio (with a focus on the network-specific package igraph)
  • Introduction to RStudio workflow components (e.g., installing and loading packages, directories and workflow, what is coding and why do we do it)
  • Introduction to network basics
    • Conceptually, we discuss:
      • What is a network?
      • How can citations make a network (i.e., ego-networks of citation) and what other data could be conceptualized as networks?
      • What do these networks tell us?
    • Practically:
      • Identifying network components (e.g., nodes, edges, attributes)
      • Constructing network objects
      • Visualizing networks
      • Calculating typical network descriptive measures
  • Meaningful interpretation of networks related to innovation

Below is an example of the network students learned to create, based on a single focal patent in the center of the network. Nodes are sized according to their number of technological claims and shaded by their primary technology class.

Example of network

The Dataset

The data for this Expedition consist of patent records from the US Patent and Trademark Office (USPTO). The raw data are individual patents granted by the USPTO, which include information on the nature of the technology being patented, the year the patent is granted, the patents each focal patent cites as “prior art,” and numerous other data points. The version of the data used for this Expedition is publicly available from the PatentsView project (www.patentsview.org), a joint effort between the USPTO, American Institutes of Research, NYU, UC Berkeley, and other stakeholders. PatentsView.org exists to make the bulk USPTO data files accessible for research and practitioner use.

The dataset used in the Expedition is a small subset of the total patent database. It was collected by identifying three focal patents (the egos in the example networks used by students) and then all patents that cited those three focal patents up to 2017. The citations among these patents were also collected, creating three distinct patent ego-networks. For both focal and citing patents, we have metadata on the primary technology class, patent title, abstract, number of claims, and year granted by the USPTO.

Course Materials

Networks of Innovation (Powerpoint presentation)

Patent Nets Code.R

patent_1_attributes.Rdata

patent_1_edge_list.Rdata

patent_2_attributes.Rdata

patent_2_edge_list.Rdata

patent_3_attributes.Rdata

patent_3_edge_list.Rdata

Related Projects

KC and Patrick led two hands-on data workshops for ENVIRON 335: Drones in Marine Biology, Ecology, and Conservation. These labs were intended to introduce students to examples of how drones are currently being used as a remote sensing tool to monitor marine megafauna and their environments, and how machine learning can be used to efficiently analyze remote sensing datasets. The first lab specifically focused on how drones are being used to collect aerial images of whales to measure changes in body condition to help monitor populations. Students were introduced to the methods for making accurate measurements and then received an opportunity to measure whales themselves. The second lab then introduced analysis methods using computer vision and deep neural networks to detect, count, and measure objects of interest in remote sensing data. This work provided students in the environmental sciences an introduction to new techniques in machine learning and remote sensing that can be powerful multipliers of effort when analyzing large environmental datasets.

This two-week teaching module in an introductory-level undergraduate course invites students to explore the power of Twitter in shaping public discourse. The project supplements the close-reading methods that are central to the humanities with large-scale social media analysis. This exercise challenges students to consider how applying visualization techniques to a dataset too vast for manual apprehension might enable them to identify for granular inspection smaller subsets of data and individual tweets—as well as to determine what factors do not lend themselves to close-reading at all. Employing an original dataset of almost one million tweets focused on the contested 2018 Florida midterm elections, students develop skills in using visualization software, generating research questions, and creating novel visualizations to answer those questions. They then evaluate and compare the affordances of large-scale data analytics with investigation of individual tweets, and draw on their findings to debate the role of social media in shaping public conversations surrounding major national events. This project was developed as a collaboration among the English Department (Emma Davenport and Astrid Giugni), Math Department (Hubert Bray), Duke University Library (Eric Monson), and Trinity Technology Services (Brian Norberg).

Understanding how to generate, analyze, and work with datasets in the humanities is often a difficult task without learning how to code or program. In humanities centered courses, we often privilege close reading or qualitative analysis over other methods of knowing, but by learning some new quantitative techniques we better prepare the students to tackle new forms of reading. This class will work with the data from the HathiTrust to develop ideas for thinking about how large groups and different discourse communities thought of queens of antiquity like Cleopatra and Dido.

Please refer to https://sites.duke.edu/queensofantiquity/ for more information.