Machine Learning in the Regulatory Process
October 17, 2018
For example, in Pittsburgh, the county government has digitized all of its records and is beginning to use big data analytical tools to improve its health and human services.  While analyzing public data may be nothing new, the scale and tools used to analyze it effectively and efficiently are expanding. Pittsburgh is now using machine learning to identify children who are more likely to suffer an injury or death because of potentially unsafe living conditions, whereas before it was done manually. 
Another example is using machine learning to determine which restaurants need hygiene inspections by using natural language processing on restaurant reviews.  Machine learning algorithms will also help governments better allocate its inspectors by improving operational efficiency.
However, the algorithms are not restricted to local issues. Machine learning algorithms are also being used to allocate refugees to the cities in which they would best be able to integrate themselves.  This algorithm has boosted refugee employment anywhere from 40-70% in cities around the world. 
Machine learning not only allows governments to analyze the large amount of data they collect much faster, but it also reduces the number of errors made. Furthermore, the machine learning algorithms may be able to detect patterns better than humans.  For example, it may be difficult for a human to detect all the relationships between variables in datasets with millions of data points. However, algorithms would have a much easier time handling this amount of data as well as finding the different relationships between variables. Additionally, humans can get tired and make sloppy mistakes whereas computers will not get tired. 
However, there are some complications that officials must consider as they begin to implement machine learning. While the algorithms may not tire, they also do not have a human mind that is able to recognize unusual data or understand when something does not look right in the data. For example, if there are negative values in a column with people’s ages, a computer would incorporate those negative values into the model it computes, but a human would know to either impute those values or exclude those observations from the model. Therefore, the need for good quality and consistency in data will be even more crucial as we begin to implement these algorithms more and more5. Additionally, the results of these algorithms are not infallible. Therefore, officials will also have to be mindful that the results of these algorithms actually make sense and that the relationships the machines find are causal, not just correlated. 
Before using these algorithms, it was easier to explain how certain conclusions were drawn and the analysis that went into them. Humans were usually involved at every or most steps and someone had to be making some decision that they needed to justify. Therefore, as long as their justifications were documented, it was pretty straightforward to explain why they did certain things and chose to not do others. In contrast, new algorithms are a black box that reveals nothing about how conclusions are made5. It is becoming harder and harder for humans to understand every step or decision the machine makes. Furthermore, these machines do not have any intuitive understanding of the data. They simply try to find the correlations and make predictions. These things make it harder for other people to buy into the algorithm’s findings and will be something officials need to deal with moving forward.
An illustration of how the way the machine algorithms work is a “black box.”
By Docurbs - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=52379695
There are some other issues that will need to be addressed going forward as well. Using an algorithm to improve public policy may seem unbiased on the outside, especially compared to issues we have seen in the past regrading officials and racial profiling. However, these machine learning algorithms can encode existing biases, for example, in racialized law enforcement.  One method to counter potential bias is to build the model to consider the differences between populations as well as avoid over-generalizations.  Another thing that officials will have to consider is potentially using different models for different populations.  A different option would be to make the algorithms have fewer rules for populations that are less well represented in the data.  This would make sense as the less represented populations in the data may not be as representative of the population as a whole, so you would not want to jump to as many conclusions based on a smaller sample size. While these issues must be dealt with in order to improve the algorithms, there is no correct answer, so officials will have to weigh the pros and cons of each of these options.
Different algorithms and assumptions made can yield different results.
By Shiyu Ji - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=60632949
In the end, once governments can leap over some of the present-day hurdles, the future benefits of comprehensive machine learning implementation could be tremendous. The applications of these algorithms are potentially limitless and will be able to improve many facets of public sector services.
Student Blog Disclaimer
The views expressed on the Student Blog are the author’s opinions and don’t necessarily represent the Penn Wharton Public Policy Initiative’s strategies, recommendations, or opinions.
Additional Blog Posts