The traditional credit score is one of the most consequential and most flawed tools in modern finance. FICO scores and their equivalents were designed for a world where most adults had formal employment, maintained checking accounts at regulated banks, and had years of credit history available for analysis. For the roughly 1.4 billion adults globally who lack access to formal banking — and for the hundreds of millions more who have thin credit files because they are young, recently immigrated, or operate primarily in the cash economy — the traditional credit scoring system is not merely unhelpful. It is actively exclusionary.
The consequence of credit invisibility is severe. Without access to affordable credit, individuals cannot smooth consumption during income shocks, cannot invest in education or equipment that would improve their earning potential, and cannot build the financial cushion that separates economic stability from perpetual financial fragility. Small business owners cannot access working capital to grow. Farmers cannot purchase inputs at the beginning of the growing season and repay at harvest. The economic multiplier of affordable credit access is enormous — and it is largely unavailable to the people who need it most.
Machine learning and alternative data are changing this equation. The last decade has seen an explosion of fintech lending companies that use non-traditional data sources and advanced statistical models to underwrite borrowers who would be invisible to traditional credit bureaus. This piece examines how these models work, what they get right, what risks they introduce, and how responsible deployment can maximize inclusion benefits while minimizing harm.
Alternative data for credit underwriting refers to any data source not included in traditional credit bureau reports. The universe of alternative data is vast and varied:
The reason alternative data works — when used carefully — is that financial behavior leaves traces across many different data systems, and the consistency and reliability of those traces correlates with creditworthiness. A person who reliably recharges their mobile phone on time, pays their utilities before the due date, and maintains consistent transaction patterns in their digital wallet is demonstrating financial discipline that a machine learning model can identify and weight appropriately — even without a credit bureau record.
Early fintech lending models used relatively simple logistic regression approaches with a handful of alternative data features. The current generation uses considerably more sophisticated architectures — gradient boosting models, neural networks, and increasingly, large language models that can process unstructured data signals — and incorporates hundreds or thousands of features from multiple data sources.
The models that work best in underbanked lending contexts tend to share several characteristics. First, they are built and validated on data from the specific population being served rather than being adapted from models trained on mainstream credit populations. A model trained on US prime borrowers will perform poorly on Kenyan informal workers, not because the concept of creditworthiness is different but because the behavioral proxies for it are completely different. Local market training data is essential.
Second, the best models are designed for dynamic updating — incorporating new data signals as the borrower relationship develops over time. First-loan underwriting based on limited data is necessarily less accurate than underwriting for a repeat borrower with twelve months of repayment history. Building loan products with small initial amounts and short tenors that generate repayment data quickly, then using that data to graduate borrowers to larger, longer loans at better rates, is the most reliable path to accurate underwriting in data-sparse markets.
The promise of AI-driven credit underwriting — that objective data-driven models will be fairer than human underwriters subject to conscious and unconscious bias — has not always been borne out in practice. Machine learning models can perpetuate and amplify discrimination if they are trained on historical lending data that reflects past discriminatory practices, or if they use features that correlate with protected characteristics like race, gender, or national origin.
This is not a theoretical concern. Several high-profile cases in the United States have shown that algorithmic lending models can produce outcomes that disadvantage minority borrowers even when race is not explicitly included as a feature — because features that seem neutral, like zip code, educational institution, or employment sector, can serve as proxies for race in ways the model learns but humans do not notice during design.
Responsible AI credit underwriting requires explicit fairness testing across demographic subgroups, explainability mechanisms that allow borrowers to understand why they were denied credit and take steps to improve their profiles, and human oversight of model outputs — particularly for edge cases and denied applications. The companies building in this space that we admire most are those that treat fairness not as a compliance checkbox but as a core design principle that shapes every feature selection, model architecture, and deployment decision.
Blok AI Capital has developed a framework for evaluating AI lending companies that we believe separates responsible inclusion-focused lenders from those who use the language of inclusion while building extractive products:
If you are building in AI-driven lending for underserved populations, we would welcome a conversation about what you are building and how Blok AI Capital might support your work. Connect with us via the contact page.