Exploring Patent Networks Using U.S. Patent and Trademark Office Patent Records

Project Summary

This data expedition explores the local (ego) patent citation networks of three hybrid vehicle-related patents. The concept of patent citations and technological development is a core theme in innovation and entrepreneurship, and the purpose of these network explorations is to both quantitatively and visually assess how innovations are connected and what these connections mean for the focal innovations and the technologies that draw on those patents in the future. The expedition was incorporated as part of the Sociology of Entrepreneurship class, where students are thinking about the emergence and diffusion of innovations.

Themes and Categories
Year
2018

Graduate Students: Josh Bruce (joshua.bruce@duke.edu) and Molly Copeland (molly.copeland@duke.edu)

Faculty: Dr. Martin Ruef

Course: Sociology of Entrepreneurship (SOCIOL/MMS 359)

Guiding Questions

The main research question students answered with this expedition was: What are the characteristics of innovation networks? To address this, how can we conceptualize and visualize patent citations as evidence of innovation network structures? Then, how can we characterize features of innovation ego-networks in patents with common ego-network measures (such as size or density)?

To answer these questions, groups of students worked with different patent ego-networks in the data that are relevant to broader topics they cover in the class and class projects. Students adapted visualizations to reflect characteristics and attributes determined to be of interest within the data (e.g., patent technology classes), chose the descriptive measures that provide insight into broader questions in other innovation projects in the class, and groups of students compared descriptive measures across networks using common network measures. Additionally, after constructing the ego-networks, students brainstormed potential pitfalls of conducting and reporting descriptive network analysis (e.g., consistency in visualization, comparing across networks, interpreting network measures, coding and R-specific issues that commonly arise).

The specific techniques and teaching goals include:

  • Introduction to R and RStudio (with a focus on the network-specific package igraph)
  • Introduction to RStudio workflow components (e.g., installing and loading packages, directories and workflow, what is coding and why do we do it)
  • Introduction to network basics
    • Conceptually, we discuss:
      • What is a network?
      • How can citations make a network (i.e., ego-networks of citation) and what other data could be conceptualized as networks?
      • What do these networks tell us?
    • Practically:
      • Identifying network components (e.g., nodes, edges, attributes)
      • Constructing network objects
      • Visualizing networks
      • Calculating typical network descriptive measures
  • Meaningful interpretation of networks related to innovation

Below is an example of the network students learned to create, based on a single focal patent in the center of the network. Nodes are sized according to their number of technological claims and shaded by their primary technology class.

Example of network

The Dataset

The data for this Expedition consist of patent records from the US Patent and Trademark Office (USPTO). The raw data are individual patents granted by the USPTO, which include information on the nature of the technology being patented, the year the patent is granted, the patents each focal patent cites as “prior art,” and numerous other data points. The version of the data used for this Expedition is publicly available from the PatentsView project (www.patentsview.org), a joint effort between the USPTO, American Institutes of Research, NYU, UC Berkeley, and other stakeholders. PatentsView.org exists to make the bulk USPTO data files accessible for research and practitioner use.

The dataset used in the Expedition is a small subset of the total patent database. It was collected by identifying three focal patents (the egos in the example networks used by students) and then all patents that cited those three focal patents up to 2017. The citations among these patents were also collected, creating three distinct patent ego-networks. For both focal and citing patents, we have metadata on the primary technology class, patent title, abstract, number of claims, and year granted by the USPTO.

Course Materials

Networks of Innovation (Powerpoint presentation)

Patent Nets Code.R

patent_1_attributes.Rdata

patent_1_edge_list.Rdata

patent_2_attributes.Rdata

patent_2_edge_list.Rdata

patent_3_attributes.Rdata

patent_3_edge_list.Rdata

Related Projects

This data expedition focused on the mechanisms animals use to orient using environmental stimuli, the methods that scientists use to test hypotheses about orientation, and the statistical methods used with circular orientation data. Students collected their own data set during the class period, performed hypothesis testing on their data using circular statistics in R, and aggregated their data to formally test the hypothesis that isopods orient with light using an RShiny online application.

This exercise served as a capstone to a series of four class sessions on orientation and navigation, where students read primary scientific literature that used circular statistics in their methods. This data exercise was used to give students the opportunity to collect their own data, discover why linear statistics wouldn’t be sufficient to analyze them, and then implement their own analysis. The goal of this course was to give students a better understanding of circular statistics, with hands-on application in forming and testing a hypothesis.

In this two-day, virtual data expedition project, students were introduced to the APIM in the context of stress proliferation, linked lives, the spousal relationship, and mental and physical health outcomes.

Stress proliferation is a concept within the stress process paradigm that explains how one person’s stressors can influence others (Thoits 2010). Combining this with the life course principle of linked lives explains that because people are embedded in social networks, stress not only can impact the individual but can also proliferate to people close to them (Elder Jr, Shanahan and Jennings 2015). For example, one spouse’s chronic health condition may lead to stress-provoking strain in the marital relationship, eventually spilling over to affect the other spouse’s mental health. Additionally, because partners share an environment, experiences, and resources (e.g., money and information), as well as exert social control over each other, they can monitor and influence each other’s health and health behaviors. This often leads to health concordance within couples; in other words, because individuals within the couple influence each other’s health and well-being, their health tends to become more similar or more alike (Kiecolt-Glaser and Wilson 2017, Polenick, Renn and Birditt 2018). Thus, a spouse’s current health condition may influence their partner’s future health and spouses may contemporaneously exhibit similar health conditions or behaviors.

However, how spouses influence each other may be patterned by the gender of the spouse with the health condition or exhibiting the health behaviors. Recent evidence suggests that a wife’s health condition may have little influence on her husband’s future health conditions, but that a husband’s health condition will most likely influence his wife’s future health (Kiecolt-Glaser and Wilson 2017).

Fluid mechanics is the study of how fluids (e.g., air, water) move and the forces on them. Scientists and engineers have developed mathematical equations to model the motions of fluid and inertial particles. However, these equations are often computationally expensive, meaning they take a long time for the computer to solve.

 

To reduce the computation time, we can use machine learning techniques to develop statistical models of fluid behavior. Statistical models do not actually represent the physics of fluids; rather, they learn trends and relationships from the results of previous simulation experiments. Statistical models allow us to leverage the findings of long, expensive simulations to obtain results in a fraction of the time. 

 

In this project, we provide students with the results of direct numerical simulations (DNS), which took many weeks for the computer to solve. We ask students to use machine learning techniques to develop statistical models of the results of the DNS.