Running the Analysis

This section provides a brief overview of the main steps required to run the PACTA for Banks analysis. It starts with a high-level overview and elaborates on each step in more detail below. The whole process is demonstrated with code-snippets so that following them in your own environment is as straight-forward as possible.

Structure of the Workflow

Assuming you have prepared and/or procured all required input data sets as described in the section “Preparatory Steps”, implementing the PACTA for Banks analysis now follows the following main steps:

Setup: Load the necessary packages and import the data.
Matching process: Match the counterparties in the raw input loan book with the companies in the asset-based company data.
Calculate PACTA metrics: Run the PACTA analysis to obtain alignment metrics for the matched loan book relative to benchmarks based on transition scenarios.
Plot PACTA metrics: Visualize the results of the PACTA analysis using the standard PACTA plots to facilitate interpretation.

Fig. 1: Structure of the Workflow

In principle, these steps are sequential. However, in practice it has shown that one rarely ever achieves the best possible match rate in one iteration. Therefore, it may be required to iterate the matching process multiple times. We will cover these steps in more detail now.

Setup

We use the {pacta.loanbook} package to access the most important functions you’ll learn about. We also use example datasets from the package {pacta.loanbook}, and optional but convenient functions from the packages {dplyr}, {readxl}, and {readr}.

library(pacta.loanbook)
library(dplyr)
library(readxl)
library(readr)

Import Data

We need two datasets to begin: a loanbook and an asset-based company dataset (abcd). For more detail about the necessary structure of these datasets, see the data dictionaries for loanbook and abcd.

To simulate having these data files prepared, we will use the demo data included in the {pacta.loanbook} package to create example XLS files using the {writexl} package.

library(writexl)

writexl::write_xlsx(
  x = loanbook_demo,
  path = file.path(tempdir(), "loanbook.xlsx")
)

writexl::write_xlsx(
  x = abcd_demo,
  path = file.path(tempdir(), "abcd.xlsx")
)

Typically, these data files are stored in XLS files on your computer. As an example, they could be imported as in the code below using the {readxl} package, however, if you use custom data that does not strictly follow the format as defined in the previous section, you may need to adjust the import code and/or do some data manipulation in R after importing.

loanbook <- readxl::read_excel(path = file.path(tempdir(), "loanbook.xlsx"))

loanbook
#> # A tibble: 283 × 13
#>    id_loan id_direct_loantaker name_direct_loantaker       id_ultimate_parent name_ultimate_parent                   loan_size_outstanding loan_size_outstanding_currency loan_size_credit_limit loan_size_credit_limit_currency sector_classification_system sector_classification_direct_loantaker lei_direct_loantaker isin_direct_loantaker
#>    <chr>   <chr>               <chr>                       <chr>              <chr>                                                  <dbl> <chr>                                           <dbl> <chr>                           <chr>                        <chr>                                  <chr>                <lgl>                
#>  1 L1      C294                Vitale Group                UP15               Scholz KGaA                                           225625 EUR                                          18968805 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#>  2 L2      C293                Moen-Moen                   UP84               <NA>                                                  301721 EUR                                          19727961 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#>  3 L3      C292                Rowe-Rowe                   UP288              Harvey, Harvey and Harvey                             410297 EUR                                          20811147 EUR                             NACE                         D35.11                                 801600ATB1D513IUFZ77 NA                   
#>  4 L4      C299                Fadel-Fadel                 UP54               <NA>                                                  233049 EUR                                          19042869 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#>  5 L5      C305                Ring AG & Co. KGaA          UP104              Conti, Conti e Conti SPA                              406585 EUR                                          20774115 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#>  6 L6      C304                Kassulke-Kassulke           UP83               Farrell, Farrell and Farrell                          185721 EUR                                          18570711 EUR                             NACE                         D35.11                                 108200TUBGG1DR0QR635 NA                   
#>  7 L7      C227                Morissette Group            UP134              Weller AG                                             184793 EUR                                          18561453 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#>  8 L8      C303                Barone s.r.l.               UP163              Williamsonn PLC                                       291513 EUR                                          19626123 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#>  9 L9      C301                Werner Werner AG & Co. KGaA UP138              Leone-Leone s.r.l.                                    407513 EUR                                          20783373 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#> 10 L10     C302                De rosa s.r.l.              UP32               Benedetti, Benedetti e Benedetti Group                186649 EUR                                          18579969 EUR                             NACE                         D35.11                                 <NA>                 NA                   
#> # ℹ 273 more rows

abcd <- readxl::read_excel(path = file.path(tempdir(), "abcd.xlsx"))

abcd
#> # A tibble: 4,972 × 12
#>    company_id name_company                          lei                  sector technology          production_unit  year production emission_factor plant_location is_ultimate_owner emission_factor_unit             
#>    <chr>      <chr>                                 <chr>                <chr>  <chr>               <chr>           <dbl>      <dbl>           <dbl> <chr>          <lgl>             <chr>                            
#>  1 175        Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power  hydrocap            MW               2020    121032.          NA     MY             TRUE              tonnes of CO2 per per hour per MW
#>  2 175        Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power  hydrocap            MW               2021    119274.          NA     MY             TRUE              tonnes of CO2 per per hour per MW
#>  3 175        Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power  hydrocap            MW               2022    117515.          NA     MY             TRUE              tonnes of CO2 per per hour per MW
#>  4 175        Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power  hydrocap            MW               2023    115757.          NA     MY             TRUE              tonnes of CO2 per per hour per MW
#>  5 175        Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power  hydrocap            MW               2024    113999.          NA     MY             TRUE              tonnes of CO2 per per hour per MW
#>  6 175        Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power  hydrocap            MW               2025    112240.          NA     MY             TRUE              tonnes of CO2 per per hour per MW
#>  7 526        Vitali, Vitali e Vitali s.r.l.        887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year  2020   6327858.           0.723 CI             TRUE              tonnes of CO2 per tonne of cement
#>  8 526        Vitali, Vitali e Vitali s.r.l.        887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year  2021   6569983.           0.735 CI             TRUE              tonnes of CO2 per tonne of cement
#>  9 526        Vitali, Vitali e Vitali s.r.l.        887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year  2022   6812108.           0.746 CI             TRUE              tonnes of CO2 per tonne of cement
#> 10 526        Vitali, Vitali e Vitali s.r.l.        887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year  2023   7054232.           0.758 CI             TRUE              tonnes of CO2 per tonne of cement
#> # ℹ 4,962 more rows

Matching Process

The next step after loading all required input files is to run the matching process, where you will match loans from the loanbook data with production data from the abcd.

The matching process is divided into three main steps:

Apply a matching algorithm to calculate scores of the match precision between the loanbook and abcd datasets
Manually validate matches between the loanbook and abcd datasets, including selecting the appropriate match in cases where more than one company in the abcd receives a score higher than the threshold value
Prioritize validated matches by level

We will cover each of these steps in the following sections.

Score the match precision between the loanbook and abcd datasets

match_name() scores the match between names in a loanbook dataset and names in an asset-based company dataset. The names come from the columns name_direct_loantaker and name_ultimate_parent of the loanbook dataset, and from the column name_company of the asset-based company dataset. In the loan book data set, it is possible to optionally add any number of name_intermediate_parent_* columns, where * indicates the level up the corporate tree from direct_loantaker.

The raw names are internally transformed applying best-practices commonly used in name matching algorithms, such as:

Remove special characters
Replace language specific characters
Abbreviate certain names to reduce their importance in the matching
Remove corporate suffixes when necessary
Spell out numbers to increase their importance

The similarity is then scored between the internally-transformed names of the loanbook against the names in the abcd. For more information on the scoring algorithm used, see stringdist::stringsim().

The basic usage of match_name() is as follows:

matched <- match_name(loanbook, abcd)

matched
#> # A tibble: 326 × 22
#>    id_loan id_direct_loantaker name_direct_loantaker       id_ultimate_parent name_ultimate_parent         loan_size_outstanding loan_size_outstanding_currency loan_size_credit_limit loan_size_credit_limit_currency sector_classification_system sector_classification_direct_loantaker lei_direct_loantaker isin_direct_loantaker id_2dii level   sector sector_abcd name  name_abcd score source borderline
#>    <chr>   <chr>               <chr>                       <chr>              <chr>                                        <dbl> <chr>                                           <dbl> <chr>                           <chr>                        <chr>                                  <chr>                <lgl>                 <chr>   <chr>   <chr>  <chr>       <chr> <chr>     <dbl> <chr>  <lgl>     
#>  1 L1      C294                Vitale Group                UP15               Scholz KGaA                                 225625 EUR                                          18968805 EUR                             NACE                         D35.11                                 <NA>                 NA                    UP190   ultima… power  power       Scho… Scholz K… 1     loanb… FALSE     
#>  2 L3      C292                Rowe-Rowe                   UP288              Harvey, Harvey and Harvey                   410297 EUR                                          20811147 EUR                             NACE                         D35.11                                 801600ATB1D513IUFZ77 NA                    UP101   ultima… power  power       Harv… Harvey, … 1     loanb… FALSE     
#>  3 L5      C305                Ring AG & Co. KGaA          UP104              Conti, Conti e Conti SPA                    406585 EUR                                          20774115 EUR                             NACE                         D35.11                                 <NA>                 NA                    UP39    ultima… power  power       Cont… Conti, C… 1     loanb… FALSE     
#>  4 L6      C304                Kassulke-Kassulke           UP83               Farrell, Farrell and Farrell                185721 EUR                                          18570711 EUR                             NACE                         D35.11                                 108200TUBGG1DR0QR635 NA                    DL129   direct… power  power       Kass… Kassulke… 1     loanb… FALSE     
#>  5 L6      C304                Kassulke-Kassulke           UP83               Farrell, Farrell and Farrell                185721 EUR                                          18570711 EUR                             NACE                         D35.11                                 108200TUBGG1DR0QR635 NA                    UP63    ultima… power  power       Farr… Farrell,… 1     loanb… FALSE     
#>  6 L7      C227                Morissette Group            UP134              Weller AG                                   184793 EUR                                          18561453 EUR                             NACE                         D35.11                                 <NA>                 NA                    DL179   direct… power  power       Mori… Moretti … 0.911 loanb… FALSE     
#>  7 L7      C227                Morissette Group            UP134              Weller AG                                   184793 EUR                                          18561453 EUR                             NACE                         D35.11                                 <NA>                 NA                    UP224   ultima… power  power       Well… Weller AG 1     loanb… FALSE     
#>  8 L8      C303                Barone s.r.l.               UP163              Williamsonn PLC                             291513 EUR                                          19626123 EUR                             NACE                         D35.11                                 <NA>                 NA                    UP228   ultima… power  power       Will… Williams… 0.987 loanb… FALSE     
#>  9 L9      C301                Werner Werner AG & Co. KGaA UP138              Leone-Leone s.r.l.                          407513 EUR                                          20783373 EUR                             NACE                         D35.11                                 <NA>                 NA                    DL274   direct… power  power       Wern… Wende We… 0.844 loanb… FALSE     
#> 10 L9      C301                Werner Werner AG & Co. KGaA UP138              Leone-Leone s.r.l.                          407513 EUR                                          20783373 EUR                             NACE                         D35.11                                 <NA>                 NA                    UP132   ultima… power  power       Leon… Leone-Le… 1     loanb… FALSE     
#> # ℹ 316 more rows

match_name() defaults to scoring matches between name strings that belong to the same sector. This implies that a potential match will only be shown if the names in the loanbook and the abcd are sufficiently similar AND the loan is classified in the same sector as the company activity in the abcd. Using by_sector = FALSE removes this limitation – increasing computation time, and the number of potentially incorrect matches to manually validate. In most cases, it is recommended to keep by_sector = TRUE. However, lifting the restriction can be helpful in cases where the sector classification in the loanbook is not reliable or no sector classification is available at all. Below you can see that the removal of the restriction increases the number of potential matches.

match_name(loanbook, abcd, by_sector = FALSE) %>% nrow()
#> [1] 656

# Compare to
match_name(loanbook, abcd, by_sector = TRUE) %>% nrow()
#> [1] 326

The min_score argument allows you to set a minimum threshold score. This is useful when you either want to filter out potential matches that are not precise enough, or want to add additional potential matches because the given suggestions do not seem to provide sufficient matching coverage. The default value is 0.8, but you can set it to any value between 0 and 1. Since a higher score implies a closer match between company names in the loanbook and the abcd, fewer potential matches will usually be shown. The following code shows how to set the threshold to 0.9:

match_name(loanbook, abcd, min_score = 0.9) %>%
  pull(score) %>%
  range()
#> [1] 0.9007692 1.0000000

Manually validate matches

After the initial calculation of scores for potential matches, you will manually have to inspect the suggested matches and decide which ones to keep or remove. This is especially important when using text-based matching because there is no guarantee that similar company names as identified by the algorithm actually refer to the same companies. The manual validation step is therefore crucial, it ensures data quality in the remainder of the analysis.

The validation process is not automated and requires some time and effort on your part. This is often the most time-consuming part of the analysis, especially when trying to improve the matching coverage in multiple iterations.

Below you will find a brief description of the manual validation process:

Write the output of match_name() into a .csv file, for example using {readr} to save a CSV file:

readr::write_csv(x = matched, file = file.path(tempdir(), "matched.csv"))

Compare, edit, and save the data manually:

Open matched.csv with any spreadsheet editor (Excel, Google Sheets, etc.).
Compare the columns name and name_abcd manually to determine if the match is valid. Other information can be used in conjunction with just the names to ensure the two entities match (sector, internal information on the company structure, etc.)
Edit the data:
- If you are happy with the match, set the score value to 1.
- Otherwise set or leave the score value to anything other than 1.
Save the edited file as, for example, valid_matches.csv.

Re-import the edited file (validated), for example using {readr} to read a CSV file:

valid_matches <- readr::read_csv(file = file.path(tempdir(), "valid_matches.csv"), show_col_types = FALSE)

You can find more detailed information about the matching process under the header “PACTA for Banks Training Webinar 2” in the training materials section of the PACTA for Banks website and in the corresponding slide deck.

Prioritize validated matches by level

The validated dataset may have multiple matches per loan. Consider the case where a loan is given to “Acme Power USA”, a subsidiary of “Acme Power Co.”. There may be both “Acme Power USA” and “Acme Power Co.” in the abcd, and so there could be two valid matches for this loan. To get the best match only, use prioritize() – it picks rows where score is 1 and level per loan is of highest priority:

some_interesting_columns <- c("id_2dii", "level", "score")

prioritized_matches <-
  valid_matches %>%
  prioritize()

prioritized_matches %>% select(all_of(some_interesting_columns))
#> # A tibble: 177 × 3
#>    id_2dii level            score
#>    <chr>   <chr>            <dbl>
#>  1 DL129   direct_loantaker     1
#>  2 DL144   direct_loantaker     1
#>  3 DL270   direct_loantaker     1
#>  4 DL86    direct_loantaker     1
#>  5 DL5     direct_loantaker     1
#>  6 DL80    direct_loantaker     1
#>  7 DL150   direct_loantaker     1
#>  8 DL3     direct_loantaker     1
#>  9 DL65    direct_loantaker     1
#> 10 DL79    direct_loantaker     1
#> # ℹ 167 more rows

By default, highest priority refers to the most granular match (direct_loantaker). The default priority is set internally via prioritize_level().

prioritize_level(valid_matches)
#> [1] "direct_loantaker" "ultimate_parent"

You may use a different priority. One way to do that is to pass a function to priority. For example, use rev() to reverse the default priority.

valid_matches %>%
  prioritize(priority = rev) %>%
  select(all_of(some_interesting_columns))
#> # A tibble: 177 × 3
#>    id_2dii level           score
#>    <chr>   <chr>           <dbl>
#>  1 UP190   ultimate_parent     1
#>  2 UP101   ultimate_parent     1
#>  3 UP39    ultimate_parent     1
#>  4 UP63    ultimate_parent     1
#>  5 UP224   ultimate_parent     1
#>  6 UP132   ultimate_parent     1
#>  7 UP12    ultimate_parent     1
#>  8 UP20    ultimate_parent     1
#>  9 UP134   ultimate_parent     1
#> 10 UP127   ultimate_parent     1
#> # ℹ 167 more rows

A Note on Sector Classifications and the `borderline` Flag

A note on sector classification

Matches are preferred when the sector from the loanbook matches the sector from the abcd. The loanbook sector is determined internally by mapping the sector classification code from the loan book to a PACTA sector, using the sector_classification_system and sector_classification_direct_loantaker columns. Currently, the following options are supported for sector_classification_system, specified in sector_classifications:

GICS, ISIC, NACE, NAICS, PSIC, SIC

The `borderline` flag

An issue arises when, for example, a company is classified in the “power transmission” sector. In a perfect world, these companies would produce no electricity, and we would not try to match them. In practice, however, we find there is often overlap, as companies may have integrated business models that span multiple steps of the supply chain or sector classification systems do not break down every code clearly enough to map it to PACTA sectors unambiguously. For this reason, we introduced the borderline flag.

In the example below, we see two classification codes coming from the NACE classification standard:

nace_classification %>% 
  filter(code %in% c("D35.11", "D35.14", "D35.1"))
#> # A tibble: 3 × 6
#>   original_code description                      code  sector borderline version
#>   <chr>         <chr>                            <chr> <chr>  <lgl>      <chr>  
#> 1 35.1          35.1 Electric power generation,… D35.1 power  TRUE       2.1    
#> 2 35.11         35.11 Production of electricity… D35.… power  FALSE      2.1    
#> 3 35.14         35.14 Distribution of electrici… D35.… power  TRUE       2.1

Notice that the code D35.11 corresponds to power generation. This is an identical match to power sector in PACTA, and thus the borderline flag is set to FALSE. In contrast, code D35.14 corresponds to the distribution of electricity. In a perfect world, we would set this code to not in scope, however there is still a chance that these companies produce electricity. For this reason, we have mapped it to power with borderline = TRUE. Finally, code D35.1 corresponds to the wider electricity sector, covering “Electric power generation, transmission and distribution”. This is a case where the granularity is insufficient to unambiguously match the code to the power sector as defined in PACTA, and thus the borderline flag is set to TRUE.

In practice, if a company has a sector code with borderline is TRUE and is matched, then consider the company in scope. If it has a borderline value TRUE and is not matched, then consider it out of scope.

Optional: Prepare own sector classification mapper

If you can obtain sector codes in any one of those classification systems for your loan book, it is strongly recommended using them. If you do not have your loans mapped to any of these systems, you can either run the matching without a sector classification (see above), or you can use a custom sector classification system.

If you want to use your own custom sector classification system, you will need to create a file that follows the same structure as the sector classification files in sector_classifications, that is, the file needs to contain the following columns (of the following types):

sector: <character>
borderline: <logical>
code: <character>
code_system: <character>

Please notice that when manually preparing a custom sector classification file, you need to decide for every code of your mapping system which PACTA sector the code maps to. This requires the system to be relatively granular to capture the nuances of the supply chains of the companies in your loan book. As discussed above, any sector code that is not unambiguously in scope or out of scope for a given PACTA sector should be marked TRUE in the borderline column. Any sector code that is clearly not in scope should be marked FALSE and receive the sector value not in scope. Any sector code that is clearly in scope should be marked FALSE for borderline and receive the PACTA sector value that it maps to in the sector column.

ADD EXAMPLE

Calculate the Match Success Rate

{pacta.loanbook} allows you to match loans from your loanbook to the companies in an asset-based company dataset. However, matching every loan is unlikely – some loan-taking companies may be missing from the asset-based company dataset, or they may not operate in the sectors PACTA focuses on (power, cement, oil and gas, aviation, coal, automotive, steel, and hdv). Thus, you may want to measure how much of the loanbook matched some asset. This article shows two ways to calculate such matching coverage:

Calculate the portion of your loanbook covered, by dollar value (i.e. using one of the loan_size_* columns).
Count the number of companies matched.

First we will need to load the additional useful package {ggplot2}:

library(ggplot2)

We will use the example datasets created in the previous section. To demonstrate our point, we create a loanbook dataset with two mismatching loans:

loanbook <- loanbook %>% 
  mutate(
    name_ultimate_parent = 
      ifelse(id_loan == "L1", "unmatched company name", name_ultimate_parent),
    sector_classification_direct_loantaker = 
      ifelse(id_loan == "L2", "99", sector_classification_direct_loantaker)
  )

We will then run the matching algorithm on this loanbook:

matched <- loanbook %>% 
    match_name(abcd) %>% 
    prioritize()
#> Warning: Some `sector_classification_direct_loantaker` are unknown:99
#> ℹ If you wish to request a new sector classification system, please create an issue here:
#> • https://github.com/RMI-PACTA/r2dii.data/issues
#> ℹ Or contact the PACTA team here:
#> • [email protected]

Note that this matched dataset will contain only loans that were matched successfully. To determine coverage, we need to go back to the original loanbook dataset. We must determine the PACTA sectors of each loan, as dictated by the sector_classification_direct_loantaker column.

For this, we join the loanbook with the sector_classifications dataset, which lists all sector classification code standards used by ‘PACTA’. Unfortunately we need to work around two caveats (you may ignore them because they are conceptually uninteresting):

In the two datasets, the columns we want to merge by have different names. We use the argument by of dplyr::left_join() to merge the columns sector_classification_system and sector_classification_direct_loantaker (from loanbook) with the columns code_system and code (from sector_classifications), respectively.
In the two datasets, the sector classification codes are represented with different data-types. We modify the column sector_classification_direct_loantaker before using dplyr::left_join() so it has the same type as the corresponding column code (otherwise dplyr::left_join() throws an error), and again after dplyr::left_join() to restore its original type.

merge_by <- c(
    sector_classification_system = "code_system",
    sector_classification_direct_loantaker = "code"
  )

loanbook_with_sectors <- loanbook %>% 
  left_join(sector_classifications, by = merge_by) %>% 
  mutate(sector_classification_direct_loantaker = as.character(sector_classification_direct_loantaker))

We can join these two datasets together, to generate our coverage dataset:

coverage <- left_join(loanbook_with_sectors, matched) %>% 
  mutate(
    loan_size_outstanding = as.numeric(loan_size_outstanding),
    loan_size_credit_limit = as.numeric(loan_size_credit_limit),
    matched = case_when(
      score == 1   ~ "Matched", 
      is.na(score) ~ "Not Matched",
      TRUE         ~ "Not Matched"
    ),
    sector = case_when(
      borderline == TRUE & matched == "Not Matched" ~ "not in scope",
      TRUE ~ sector
    )
  )
#> Joining with `by = join_by(id_loan, id_direct_loantaker, name_direct_loantaker,
#> id_ultimate_parent, name_ultimate_parent, loan_size_outstanding,
#> loan_size_outstanding_currency, loan_size_credit_limit,
#> loan_size_credit_limit_currency, sector_classification_system,
#> sector_classification_direct_loantaker, lei_direct_loantaker,
#> isin_direct_loantaker, sector, borderline)`

1. Calculate the portion of your loanbook covered by dollar value

From the coverage dataset, we can calculate the total loanbook coverage by dollar value. Let’s create two helper functions, one to calculate dollar-value and another one to plot coverage in general.

dollar_value <- function(data, ...) {
  data %>% 
    summarize(loan_size_outstanding = sum(loan_size_outstanding), .by = c(matched, ...))
}

plot_coverage <- function(data, x, y) {
  ggplot(data) + 
    geom_col(aes({{x}}, {{y}}, fill = matched)) +
    # Use more horizontal space -- avoids overlap on x axis text
    theme(legend.position = "top")
}

Let’s first explore all loans.

coverage %>% 
  dollar_value() %>% 
  plot_coverage(matched, loan_size_outstanding)

To calculate the total, in-scope, loanbook coverage:

coverage %>% 
  filter(sector != "not in scope") %>% 
  dollar_value() %>% 
  plot_coverage(matched, loan_size_outstanding)

Break down by sector

You may break-down the plot by sector:

coverage %>% 
  dollar_value(sector) %>% 
  plot_coverage(sector, loan_size_outstanding)

Or even further, by matching level:

coverage %>% 
  mutate(matched = case_when(
    matched == "Matched" & level == "direct_loantaker"      ~ "Matched DL",
    matched == "Matched" & level == "intermediate_parent_1" ~ "Matched IP1",
    matched == "Matched" & level == "ultimate_parent"       ~ "Matched UP",
    matched == "Not Matched"                                ~ "Not Matched",
    TRUE                                                    ~ "Catch unknown"
  )) %>% 
  dollar_value(sector) %>% 
  plot_coverage(sector, loan_size_outstanding)

2. Count the number of companies

You might also be interested in knowing how many companies in your loanbook were matched. It probably makes most sense to do this at the direct_loantaker level:

companies_matched <- coverage %>% 
  summarize(no_companies = n_distinct(name_direct_loantaker), .by = c(sector, matched))

companies_matched %>% 
  plot_coverage(sector, no_companies)

Optional: Handling misclassified loans

A loan is considered misclassified, if it is identified as in-scope of a PACTA sector based on the sector classification in the input loan book, but if it turns out that this classification is incorrect based on the counterparty’s activities. For example, if a counterparty is classified as a fully in-scope power company, but in reality it is a power distribution company or even active in another sector altogether, such as upstream gas production, the counterparty is misclassified. Misclassification is a problem, because it leads to exaggeratedly low match success rates, that cannot be improved by iterating the matching process. You can search as long as you want, but a company that in reality does not operate any power plants will not be found in the ABCD regardless of how the settings of the matching algorithm are tweaked. The match success rate can only be a useful metric if the denominator, that is the classification in the loan book, is solid. This is also important to understand which loans are truly not matched. This category should only be companies that actually are active in a given in-scope sector, but were not found in the ABCD.

Identifying misclassified loans is usually a manual process, where counterparties that have not been matched are researched to determine if they should be in scope or not. Depending on the size of the loan book, this can be a time-consuming process. It is recommended to start with the unmatched counterparties with the largest loans, as these have the largest impact on the match success rate. You may then have to decide for yourself, if you want to research all unmatched loans or set a cutoff point, e.g. based on the size of the loan.

Once the list of misclassified loans is identified, there are two broad options to appropriately these cases.

Correct the classification in the raw loan book and re-run the matching process. If the loan was clearly misclassified, this may be the most appropriate way to handle the issue. It may be a good idea to keep notes on any such changes made in the input data though, to ensure the result is fully replicable. The upside of this approach is that the loan will now either be matched correctly, as it will be assigned the sector that the company should have and therefore find an entry in the ABCD data set to match against. Or, if there is still no match to be found in the ABCD, the loan will correctly be missing in the appropriate sector and therefore indicate a lower match success rate where it should. Lastly, if it should have been "not in scope", it will not inflate the denominator of the match success rate anymore.
If a manual re-classification of the raw loan book is not possible or desired, the calculation of the match success rate can be corrected by creating a file loans_to_remove.csv, which should include the column id_loan to indicate the precise misclassified loan that was identified. This loan then needs to be explicitly removed from the match success calculation, as shown below.

Let’s assume the the loan with id L255 is falsely classified as a steel company. Research shows this company is not a primary steel producer, but it makes steel-based products for end use. Hence it is active downstream in the value chain and cannot be matched to the ABCD. If we cannot correct the input sector classification, we instead create a csv file with the id of the misclassified loan. In practice, this can be done programmatically or manually.

readr::write_csv(
  x = tibble(id_loan = "L255"),
  file = file.path(tempdir(), "loans_to_remove.csv")
)

The loans_to_remove.csv file should have just one column, id_loan, and therefore be structured as follows:

id_loan: <character>

We then read in the file and remove the misclassified loan from the match success rate calculation. We compare the coverage before and after the removal of the misclassified loan to see the impact of the correction. For demonstration purposes, we artificially increase the loan_size_outstanding of the affected counterparty while focusing on the steel sector only, to see the clear impact.

coverage_compare <- coverage %>% 
  mutate(
    loan_size_outstanding = ifelse(id_loan == "L255", 50000000, loan_size_outstanding)
  ) %>% 
  filter(sector == "steel")

coverage_compare %>%
  mutate(matched = case_when(
    matched == "Matched" & level == "direct_loantaker"      ~ "Matched DL",
    matched == "Matched" & level == "intermediate_parent_1" ~ "Matched IP1",
    matched == "Matched" & level == "ultimate_parent"       ~ "Matched UP",
    matched == "Not Matched"                                ~ "Not Matched",
    TRUE                                                    ~ "Catch unknown"
  )) %>% 
  dollar_value(sector) %>% 
  plot_coverage(sector, loan_size_outstanding)


loans_to_remove <- readr::read_csv(file.path(tempdir(), "loans_to_remove.csv"))

coverage_corrected <- coverage_compare %>% 
  anti_join(loans_to_remove, by = "id_loan")

coverage_corrected %>% 
  mutate(matched = case_when(
    matched == "Matched" & level == "direct_loantaker"      ~ "Matched DL",
    matched == "Matched" & level == "intermediate_parent_1" ~ "Matched IP1",
    matched == "Matched" & level == "ultimate_parent"       ~ "Matched UP",
    matched == "Not Matched"                                ~ "Not Matched",
    TRUE                                                    ~ "Catch unknown"
  )) %>% 
  dollar_value(sector) %>% 
  plot_coverage(sector, loan_size_outstanding)

The reason why it is a good idea to either correct misclassified loans or disregard them in the calculation of the match success rate is that a misclassified loan cannot possibly be matched in a given sector. Therefore, no amount of work would be sufficient to improve the sector match success rate, because it is calculated against an incorrect baseline. Technically, the user is not forced to correct misclassifications, and there may be a limit to how much time should be spent on this, but it is recommended to at least correct large misclassified loans.

…

Calculate PACTA alignment metrics

You can calculate scenario targets using two different approaches: Market Share Approach, or Sectoral Decarbonization Approach.

Market Share Approach

The Market Share Approach is used to calculate scenario targets for the production of a technology in a sector. For example, we can use this approach to set targets for the production of electric vehicles in the automotive sector. This approach is recommended for sectors where a granular technology scenario roadmap exists.

Targets can be set at the portfolio level:

# Use these datasets to practice but eventually you should use your own data.
scenario <- scenario_demo_2020
regions <- region_isos_demo

market_share_targets_portfolio <-
  target_market_share(
    data = prioritized_matches,
    abcd = abcd,
    scenario = scenario,
    region_isos = regions
  )

market_share_targets_portfolio
#> # A tibble: 1,076 × 10
#>    sector     technology  year region scenario_source metric     production technology_share scope  percentage_of_initial_production_by_scope
#>    <chr>      <chr>      <dbl> <chr>  <chr>           <chr>           <dbl>            <dbl> <chr>                                      <dbl>
#>  1 automotive electric    2020 global demo_2020       projected     145649.            0.165 sector                                  0       
#>  2 automotive electric    2020 global demo_2020       target_cps    145649.            0.165 sector                                  0       
#>  3 automotive electric    2020 global demo_2020       target_sds    145649.            0.165 sector                                  0       
#>  4 automotive electric    2020 global demo_2020       target_sps    145649.            0.165 sector                                  0       
#>  5 automotive electric    2021 global demo_2020       projected     147480.            0.165 sector                                  0.000739
#>  6 automotive electric    2021 global demo_2020       target_cps    146915.            0.165 sector                                  0.000511
#>  7 automotive electric    2021 global demo_2020       target_sds    153332.            0.167 sector                                  0.00310 
#>  8 automotive electric    2021 global demo_2020       target_sps    147258.            0.165 sector                                  0.000649
#>  9 automotive electric    2022 global demo_2020       projected     149310.            0.165 sector                                  0.00148 
#> 10 automotive electric    2022 global demo_2020       target_cps    148155.            0.164 sector                                  0.00101 
#> # ℹ 1,066 more rows

Or at the company level:

market_share_targets_company <-
  target_market_share(
    data = prioritized_matches,
    abcd = abcd,
    scenario = scenario,
    region_isos = regions,
    by_company = TRUE, # Output results at company-level
    weight_production = FALSE
  )

market_share_targets_company
#> # A tibble: 14,505 × 11
#>    sector     technology  year region scenario_source name_abcd                         metric     production technology_share scope  percentage_of_initial_production_by_scope
#>    <chr>      <chr>      <dbl> <chr>  <chr>           <chr>                             <chr>           <dbl>            <dbl> <chr>                                      <dbl>
#>  1 automotive electric    2020 global demo_2020       Bernardi, Bernardi e Bernardi SPA projected     324592.           0.0759 sector                                         0
#>  2 automotive electric    2020 global demo_2020       Bernardi, Bernardi e Bernardi SPA target_cps    324592.           0.0759 sector                                         0
#>  3 automotive electric    2020 global demo_2020       Bernardi, Bernardi e Bernardi SPA target_sds    324592.           0.0759 sector                                         0
#>  4 automotive electric    2020 global demo_2020       Bernardi, Bernardi e Bernardi SPA target_sps    324592.           0.0759 sector                                         0
#>  5 automotive electric    2020 global demo_2020       Christiansen PLC                  projected     512438.           0.130  sector                                         0
#>  6 automotive electric    2020 global demo_2020       Christiansen PLC                  target_cps    512438.           0.130  sector                                         0
#>  7 automotive electric    2020 global demo_2020       Christiansen PLC                  target_sds    512438.           0.130  sector                                         0
#>  8 automotive electric    2020 global demo_2020       Christiansen PLC                  target_sps    512438.           0.130  sector                                         0
#>  9 automotive electric    2020 global demo_2020       Donati, Donati e Donati s.r.l.    projected     277214.           1      sector                                         0
#> 10 automotive electric    2020 global demo_2020       Donati, Donati e Donati s.r.l.    target_cps    277214.           1      sector                                         0
#> # ℹ 14,495 more rows

These results can be saved/written to a CSV file for potential input to other models and software, for example:

readr::write_csv(x = market_share_targets_portfolio, file = file.path(tempdir(), "market_share_targets_portfolio.csv"))
readr::write_csv(x = market_share_targets_company, file = file.path(tempdir(), "market_share_targets_company.csv"))

Remember to replace tempdir() with the path to the directory that you wish to save your files in.

Sectoral Decarbonization Approach

The Sectoral Decarbonization Approach is used to calculate scenario targets for the emission_factor of a sector. For example, you can use this approach to set targets for the average emission factor of the cement sector. This approach is recommended for sectors lacking technology roadmaps.

# Use this dataset to practice but eventually you should use your own data.
co2 <- co2_intensity_scenario_demo

sda_targets <-
  target_sda(
    data = prioritized_matches,
    abcd = abcd,
    co2_intensity_scenario = co2,
    region_isos = regions
  ) %>% 
  filter(sector == "cement", year >= 2020)
#> Warning: Removing rows in abcd where `emission_factor` is NA

sda_targets
#> # A tibble: 110 × 6
#>    sector  year region             scenario_source emission_factor_metric emission_factor_value
#>    <chr>  <dbl> <chr>              <chr>           <chr>                                  <dbl>
#>  1 cement  2020 advanced economies demo_2020       projected                             0.0230
#>  2 cement  2020 developing asia    demo_2020       projected                             0.0595
#>  3 cement  2020 global             demo_2020       projected                             0.664 
#>  4 cement  2021 advanced economies demo_2020       projected                             0.0232
#>  5 cement  2021 developing asia    demo_2020       projected                             0.0594
#>  6 cement  2021 global             demo_2020       projected                             0.665 
#>  7 cement  2022 advanced economies demo_2020       projected                             0.0233
#>  8 cement  2022 developing asia    demo_2020       projected                             0.0592
#>  9 cement  2022 global             demo_2020       projected                             0.666 
#> 10 cement  2023 advanced economies demo_2020       projected                             0.0235
#> # ℹ 100 more rows

Or at the company level:

sda_targets_company <-
  target_sda(
    data = prioritized_matches,
    abcd = abcd,
    co2_intensity_scenario = co2,
    region_isos = regions,
    by_company = TRUE # Output results at company-level
  )
#> Warning: Removing rows in abcd where `emission_factor` is NA

sda_targets_company
#> # A tibble: 966 × 7
#>    sector  year region             scenario_source name_abcd                          emission_factor_metric emission_factor_value
#>    <chr>  <dbl> <chr>              <chr>           <chr>                              <chr>                                  <dbl>
#>  1 cement  2020 advanced economies demo_2020       Milani, Milani e Milani SPA        projected                              0.714
#>  2 cement  2020 developing asia    demo_2020       Heidrich GmbH                      projected                              0.675
#>  3 cement  2020 developing asia    demo_2020       Hovel Hovel GmbH & Co. KGaA        projected                              0.622
#>  4 cement  2020 global             demo_2020       Battaglia-Battaglia Group          projected                              0.626
#>  5 cement  2020 global             demo_2020       Bergstrom, Bergstrom and Bergstrom projected                              0.607
#>  6 cement  2020 global             demo_2020       Carsten GmbH & Co. KG              projected                              0.625
#>  7 cement  2020 global             demo_2020       Crist-Crist                        projected                              0.655
#>  8 cement  2020 global             demo_2020       Durgan-Durgan                      projected                              0.702
#>  9 cement  2020 global             demo_2020       Eberhardt Eberhardt GmbH           projected                              0.685
#> 10 cement  2020 global             demo_2020       Geissler Geissler AG & Co. KGaA    projected                              0.637
#> # ℹ 956 more rows

These results can be saved/written to a CSV file for potential input to other models and software, for example:

readr::write_csv(x = sda_targets, file = file.path(tempdir(), "sda_targets.csv"))
readr::write_csv(x = sda_targets_company, file = file.path(tempdir(), "sda_targets_company.csv"))

Remember to replace tempdir() with the path to the directory that you wish to save your files in.

Visualize PACTA alignment metrics

There are a large variety of possible visualizations stemming from the outputs of target_market_share() and target_sda(). Below, we highlight a couple of common plots that can easily be created using the {pacta.loanbook} package.

Market Share: Sector-level technology mix

From the market share output, you can plot the portfolio’s exposure to various climate sensitive technologies (projected), and compare with the corporate economy, or against various scenario targets.

# Pick the targets you want to plot.
data <- filter(
  market_share_targets_portfolio,
  scenario_source == "demo_2020",
  sector == "power",
  region == "global",
  metric %in% c("projected", "corporate_economy", "target_sds")
)

# Plot the technology mix
qplot_techmix(data)
#> The `technology_share` values are plotted for extreme years.
#> Do you want to plot different years? E.g. filter . with:`subset(., year %in% c(2020, 2030))`.
#> Warning: Removed 3 rows containing missing values or values outside the scale range
#> (`geom_bar()`).

Market Share: Technology-level volume trajectory

You can also plot the technology-specific volume trend. All starting values are normalized to 1, to emphasize that we are comparing the rates of buildout and/or retirement.

data <- filter(
  market_share_targets_portfolio,
  sector == "power",
  technology == "renewablescap",
  region == "global",
  scenario_source == "demo_2020"
)

qplot_trajectory(data)

SDA Target: Sector-level emission intensity

From the SDA output, we can compare the projected average emission intensity attributed to the portfolio, with the actual emission intensity scenario, and the scenario compliant SDA pathway that the portfolio must follow to achieve the scenario ambition by 2050.

data <- filter(sda_targets, sector == "cement", region == "global")
qplot_emission_intensity(data)

ADD INSTRUCTIONS Write to file

PREVIOUS CHAPTER: Preparatory Steps

NEXT CHAPTER: Interpretation of Results

- Running the Analysis

Running the Analysis

Running the Analysis

Structure of the Workflow

Setup

Import Data

Matching Process

Score the match precision between the loanbook and abcd datasets

Manually validate matches

Prioritize validated matches by level

A Note on Sector Classifications and the borderline Flag

A note on sector classification

The borderline flag

Optional: Prepare own sector classification mapper

Calculate the Match Success Rate

1. Calculate the portion of your loanbook covered by dollar value

Break down by sector

2. Count the number of companies

Optional: Handling misclassified loans

Calculate PACTA alignment metrics

Market Share Approach

Sectoral Decarbonization Approach

Visualize PACTA alignment metrics

Market Share: Sector-level technology mix

Market Share: Technology-level volume trajectory

SDA Target: Sector-level emission intensity

A Note on Sector Classifications and the `borderline` Flag

The `borderline` flag