6. Conclusions and Future Work

Summary of goals and contributions

Our first goal of was to accurately model the probability of default of a given loan, by blinding ourselves to features that contain information about protected classes (race, gender, etc.).

Our second goal was to choose a subset of loans that would maximize our return on investment (ROI), given the predicted probability of default of loans.

Our third goal was to assess the extent to which discrimination existed and then adjust the loans we chose to attain statistical parity.

Future work

While we explored many methods and have formulated a strong investment strategy, there are many extensions that can be explored

Modeling Extensions

One potential area for exploration is to update our formula for expected ROI to include a term that accounts for average loan approval rates by zip codes to incorporate another measure of fairness into our formulations.

With:

We could also consider investigating a time series approach to data analysis for our investment strategy.

Fairness Extensions

On the fairness front, a next step would be to implement a model that incorporates some notion of group-level fairness into the loss function. This is certainly an exciting avenue to explore, as it is a cutting-edge research area in machine learning.

It would also be wise to examine rejection data from Lending Club to inform us of characteristics that may make a loan less likely to be approved, determine if there is any implicit discrimination in that process, and potentially adjust our investment strategy.