Technology September 17, 2024

AI and Credit: How Machine Learning Is Unlocking Lending for the Underbanked

Clean data visualization dashboard showing credit analysis graphs and analytics, soft natural office light

The traditional credit score is one of the most consequential and most flawed tools in modern finance. FICO scores and their equivalents were designed for a world where most adults had formal employment, maintained checking accounts at regulated banks, and had years of credit history available for analysis. For the roughly 1.4 billion adults globally who lack access to formal banking — and for the hundreds of millions more who have thin credit files because they are young, recently immigrated, or operate primarily in the cash economy — the traditional credit scoring system is not merely unhelpful. It is actively exclusionary.

The consequence of credit invisibility is severe. Without access to affordable credit, individuals cannot smooth consumption during income shocks, cannot invest in education or equipment that would improve their earning potential, and cannot build the financial cushion that separates economic stability from perpetual financial fragility. Small business owners cannot access working capital to grow. Farmers cannot purchase inputs at the beginning of the growing season and repay at harvest. The economic multiplier of affordable credit access is enormous — and it is largely unavailable to the people who need it most.

Machine learning and alternative data are changing this equation. The last decade has seen an explosion of fintech lending companies that use non-traditional data sources and advanced statistical models to underwrite borrowers who would be invisible to traditional credit bureaus. This piece examines how these models work, what they get right, what risks they introduce, and how responsible deployment can maximize inclusion benefits while minimizing harm.

What Is Alternative Data and Why Does It Work?

Alternative data for credit underwriting refers to any data source not included in traditional credit bureau reports. The universe of alternative data is vast and varied:

Mobile phone usage patterns: Call frequency, data usage, app behavior, airtime recharge patterns, and payment of mobile bills all correlate with financial behavior and responsibility in ways that have been validated empirically across multiple markets.
Digital wallet transaction history: Frequency, volume, and consistency of wallet transactions, payment behavior, savings patterns, and merchant category distribution all provide rich signals about financial management habits.
Utility and rent payment history: Many borrowers who lack credit bureau history have years of documented on-time payments for utilities, rent, and subscription services — data that traditional bureau models often do not capture.
Psychometric assessments: In markets with very limited data availability, psychometric tests that assess financial attitudes, risk tolerance, and cognitive attributes have been used to supplement other data signals.
Social network data: In some models, the financial behavior of a borrower's social connections provides predictive signals — though this approach raises significant privacy and fairness concerns that must be carefully managed.
Business operational data: For small business lending, point-of-sale transaction data, inventory management records, supplier payment history, and digital accounting records can provide underwriting signals that capture business health far better than traditional balance sheets.

The reason alternative data works — when used carefully — is that financial behavior leaves traces across many different data systems, and the consistency and reliability of those traces correlates with creditworthiness. A person who reliably recharges their mobile phone on time, pays their utilities before the due date, and maintains consistent transaction patterns in their digital wallet is demonstrating financial discipline that a machine learning model can identify and weight appropriately — even without a credit bureau record.

The Machine Learning Models That Work Best

Early fintech lending models used relatively simple logistic regression approaches with a handful of alternative data features. The current generation uses considerably more sophisticated architectures — gradient boosting models, neural networks, and increasingly, large language models that can process unstructured data signals — and incorporates hundreds or thousands of features from multiple data sources.

The models that work best in underbanked lending contexts tend to share several characteristics. First, they are built and validated on data from the specific population being served rather than being adapted from models trained on mainstream credit populations. A model trained on US prime borrowers will perform poorly on Kenyan informal workers, not because the concept of creditworthiness is different but because the behavioral proxies for it are completely different. Local market training data is essential.

Second, the best models are designed for dynamic updating — incorporating new data signals as the borrower relationship develops over time. First-loan underwriting based on limited data is necessarily less accurate than underwriting for a repeat borrower with twelve months of repayment history. Building loan products with small initial amounts and short tenors that generate repayment data quickly, then using that data to graduate borrowers to larger, longer loans at better rates, is the most reliable path to accurate underwriting in data-sparse markets.

The Fairness Problem

The promise of AI-driven credit underwriting — that objective data-driven models will be fairer than human underwriters subject to conscious and unconscious bias — has not always been borne out in practice. Machine learning models can perpetuate and amplify discrimination if they are trained on historical lending data that reflects past discriminatory practices, or if they use features that correlate with protected characteristics like race, gender, or national origin.

This is not a theoretical concern. Several high-profile cases in the United States have shown that algorithmic lending models can produce outcomes that disadvantage minority borrowers even when race is not explicitly included as a feature — because features that seem neutral, like zip code, educational institution, or employment sector, can serve as proxies for race in ways the model learns but humans do not notice during design.

Responsible AI credit underwriting requires explicit fairness testing across demographic subgroups, explainability mechanisms that allow borrowers to understand why they were denied credit and take steps to improve their profiles, and human oversight of model outputs — particularly for edge cases and denied applications. The companies building in this space that we admire most are those that treat fairness not as a compliance checkbox but as a core design principle that shapes every feature selection, model architecture, and deployment decision.

What Responsible Deployment Looks Like

Blok AI Capital has developed a framework for evaluating AI lending companies that we believe separates responsible inclusion-focused lenders from those who use the language of inclusion while building extractive products:

Transparent pricing: APR-equivalent pricing is disclosed clearly and compared to alternatives. No hidden fees.
Ability-to-repay guardrails: Lending limits are set based on realistic repayment capacity, not maximum data-driven willingness. Overlending to vulnerable borrowers is as harmful as underlending.
Credit bureau reporting: Positive repayment behavior is reported to credit bureaus where they exist, building the formal credit history the borrower needs to access mainstream financial services over time.
Graduated lending: Initial loan amounts are small and tenors are short, with explicit pathways to larger loans at better rates for reliable repayors. This aligns lender and borrower incentives appropriately.
Responsible collections: Collection practices for delinquent borrowers do not cross into harassment, do not use social shame mechanisms, and include genuine hardship accommodation for borrowers experiencing genuine financial distress.

If you are building in AI-driven lending for underserved populations, we would welcome a conversation about what you are building and how Blok AI Capital might support your work. Connect with us via the contact page.

Key Takeaways

Traditional credit scores exclude billions of people who lack formal banking history, perpetuating financial exclusion for the most vulnerable populations.
Alternative data — mobile usage, wallet transactions, utility payments, business operational data — provides reliable creditworthiness signals that machine learning can extract effectively.
Models trained on local market data outperform those adapted from mainstream lending populations in underbanked contexts.
AI lending models can perpetuate discrimination through proxy features; explicit fairness testing and explainability are non-negotiable for responsible deployment.
Transparent pricing, ability-to-repay guardrails, credit bureau reporting, graduated lending, and responsible collections are the markers of inclusion-focused lending done right.

← Back to Insights