This section provides a brief overview of the main steps required to run the PACTA for Banks analysis. It starts with a high-level overview and elaborates on each step in more detail below. The whole process is demonstrated with code-snippets so that following them in your own environment is as straight-forward as possible.
Assuming you have prepared and/or procured all required input data sets as described in the section “Preparatory Steps”, implementing the PACTA for Banks analysis now follows the following main steps:
Fig. 1: Structure of the Workflow
In principle, these steps are sequential. However, in practice it has shown that one rarely ever achieves the best possible match rate in one iteration. Therefore, it may be required to iterate the matching process multiple times. We will cover these steps in more detail now.
We use the {pacta.loanbook}
package to access the most
important functions you’ll learn about. We also use example datasets
from the package {pacta.loanbook}
, and optional but
convenient functions from the packages {dplyr}
,
{readxl}
, and {readr}
.
We need two datasets to begin: a loanbook and an asset-based company dataset (abcd). For more detail about the necessary structure of these datasets, see the data dictionaries for loanbook and abcd.
To simulate having these data files prepared, we will use the demo
data included in the {pacta.loanbook}
package to create
example XLS files using the {writexl}
package.
library(writexl)
writexl::write_xlsx(
x = loanbook_demo,
path = file.path(tempdir(), "loanbook.xlsx")
)
writexl::write_xlsx(
x = abcd_demo,
path = file.path(tempdir(), "abcd.xlsx")
)
Typically, these data files are stored in XLS files on your computer.
As an example, they could be imported as in the code below using the
{readxl}
package, however, if you use custom data that does
not strictly follow the format as defined in the previous section, you may
need to adjust the import code and/or do some data manipulation in R
after importing.
loanbook <- readxl::read_excel(path = file.path(tempdir(), "loanbook.xlsx"))
loanbook
#> # A tibble: 283 × 13
#> id_loan id_direct_loantaker name_direct_loantaker id_ultimate_parent name_ultimate_parent loan_size_outstanding loan_size_outstanding_currency loan_size_credit_limit loan_size_credit_limit_currency sector_classification_system sector_classification_direct_loantaker lei_direct_loantaker isin_direct_loantaker
#> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <lgl>
#> 1 L1 C294 Vitale Group UP15 Scholz KGaA 225625 EUR 18968805 EUR NACE D35.11 <NA> NA
#> 2 L2 C293 Moen-Moen UP84 <NA> 301721 EUR 19727961 EUR NACE D35.11 <NA> NA
#> 3 L3 C292 Rowe-Rowe UP288 Harvey, Harvey and Harvey 410297 EUR 20811147 EUR NACE D35.11 801600ATB1D513IUFZ77 NA
#> 4 L4 C299 Fadel-Fadel UP54 <NA> 233049 EUR 19042869 EUR NACE D35.11 <NA> NA
#> 5 L5 C305 Ring AG & Co. KGaA UP104 Conti, Conti e Conti SPA 406585 EUR 20774115 EUR NACE D35.11 <NA> NA
#> 6 L6 C304 Kassulke-Kassulke UP83 Farrell, Farrell and Farrell 185721 EUR 18570711 EUR NACE D35.11 108200TUBGG1DR0QR635 NA
#> 7 L7 C227 Morissette Group UP134 Weller AG 184793 EUR 18561453 EUR NACE D35.11 <NA> NA
#> 8 L8 C303 Barone s.r.l. UP163 Williamsonn PLC 291513 EUR 19626123 EUR NACE D35.11 <NA> NA
#> 9 L9 C301 Werner Werner AG & Co. KGaA UP138 Leone-Leone s.r.l. 407513 EUR 20783373 EUR NACE D35.11 <NA> NA
#> 10 L10 C302 De rosa s.r.l. UP32 Benedetti, Benedetti e Benedetti Group 186649 EUR 18579969 EUR NACE D35.11 <NA> NA
#> # ℹ 273 more rows
abcd <- readxl::read_excel(path = file.path(tempdir(), "abcd.xlsx"))
abcd
#> # A tibble: 4,972 × 12
#> company_id name_company lei sector technology production_unit year production emission_factor plant_location is_ultimate_owner emission_factor_unit
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <lgl> <chr>
#> 1 175 Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power hydrocap MW 2020 121032. NA MY TRUE tonnes of CO2 per per hour per MW
#> 2 175 Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power hydrocap MW 2021 119274. NA MY TRUE tonnes of CO2 per per hour per MW
#> 3 175 Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power hydrocap MW 2022 117515. NA MY TRUE tonnes of CO2 per per hour per MW
#> 4 175 Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power hydrocap MW 2023 115757. NA MY TRUE tonnes of CO2 per per hour per MW
#> 5 175 Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power hydrocap MW 2024 113999. NA MY TRUE tonnes of CO2 per per hour per MW
#> 6 175 Giordano, Giordano e Giordano e figli 18500033XH6RG332SX89 power hydrocap MW 2025 112240. NA MY TRUE tonnes of CO2 per per hour per MW
#> 7 526 Vitali, Vitali e Vitali s.r.l. 887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year 2020 6327858. 0.723 CI TRUE tonnes of CO2 per tonne of cement
#> 8 526 Vitali, Vitali e Vitali s.r.l. 887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year 2021 6569983. 0.735 CI TRUE tonnes of CO2 per tonne of cement
#> 9 526 Vitali, Vitali e Vitali s.r.l. 887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year 2022 6812108. 0.746 CI TRUE tonnes of CO2 per tonne of cement
#> 10 526 Vitali, Vitali e Vitali s.r.l. 887100CAVZ6Y3KYT3Q90 cement integrated facility tonnes per year 2023 7054232. 0.758 CI TRUE tonnes of CO2 per tonne of cement
#> # ℹ 4,962 more rows
The next step after loading all required input files is to run the matching process, where you will match loans from the loanbook data with production data from the abcd.
The matching process is divided into three main steps:
We will cover each of these steps in the following sections.
match_name()
scores the match between names in a
loanbook dataset and names in an asset-based company dataset. The names
come from the columns name_direct_loantaker
and
name_ultimate_parent
of the loanbook dataset, and from the
column name_company
of the asset-based company dataset. In
the loan book data set, it is possible to optionally add any number of
name_intermediate_parent_*
columns, where *
indicates the level up the corporate tree from
direct_loantaker
.
The raw names are internally transformed applying best-practices commonly used in name matching algorithms, such as:
The similarity is then scored between the internally-transformed
names of the loanbook against the names in the abcd. For more
information on the scoring algorithm used, see
stringdist::stringsim()
.
The basic usage of match_name()
is as follows:
matched <- match_name(loanbook, abcd)
matched
#> # A tibble: 326 × 22
#> id_loan id_direct_loantaker name_direct_loantaker id_ultimate_parent name_ultimate_parent loan_size_outstanding loan_size_outstanding_currency loan_size_credit_limit loan_size_credit_limit_currency sector_classification_system sector_classification_direct_loantaker lei_direct_loantaker isin_direct_loantaker id_2dii level sector sector_abcd name name_abcd score source borderline
#> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <lgl>
#> 1 L1 C294 Vitale Group UP15 Scholz KGaA 225625 EUR 18968805 EUR NACE D35.11 <NA> NA UP190 ultima… power power Scho… Scholz K… 1 loanb… FALSE
#> 2 L3 C292 Rowe-Rowe UP288 Harvey, Harvey and Harvey 410297 EUR 20811147 EUR NACE D35.11 801600ATB1D513IUFZ77 NA UP101 ultima… power power Harv… Harvey, … 1 loanb… FALSE
#> 3 L5 C305 Ring AG & Co. KGaA UP104 Conti, Conti e Conti SPA 406585 EUR 20774115 EUR NACE D35.11 <NA> NA UP39 ultima… power power Cont… Conti, C… 1 loanb… FALSE
#> 4 L6 C304 Kassulke-Kassulke UP83 Farrell, Farrell and Farrell 185721 EUR 18570711 EUR NACE D35.11 108200TUBGG1DR0QR635 NA DL129 direct… power power Kass… Kassulke… 1 loanb… FALSE
#> 5 L6 C304 Kassulke-Kassulke UP83 Farrell, Farrell and Farrell 185721 EUR 18570711 EUR NACE D35.11 108200TUBGG1DR0QR635 NA UP63 ultima… power power Farr… Farrell,… 1 loanb… FALSE
#> 6 L7 C227 Morissette Group UP134 Weller AG 184793 EUR 18561453 EUR NACE D35.11 <NA> NA DL179 direct… power power Mori… Moretti … 0.911 loanb… FALSE
#> 7 L7 C227 Morissette Group UP134 Weller AG 184793 EUR 18561453 EUR NACE D35.11 <NA> NA UP224 ultima… power power Well… Weller AG 1 loanb… FALSE
#> 8 L8 C303 Barone s.r.l. UP163 Williamsonn PLC 291513 EUR 19626123 EUR NACE D35.11 <NA> NA UP228 ultima… power power Will… Williams… 0.987 loanb… FALSE
#> 9 L9 C301 Werner Werner AG & Co. KGaA UP138 Leone-Leone s.r.l. 407513 EUR 20783373 EUR NACE D35.11 <NA> NA DL274 direct… power power Wern… Wende We… 0.844 loanb… FALSE
#> 10 L9 C301 Werner Werner AG & Co. KGaA UP138 Leone-Leone s.r.l. 407513 EUR 20783373 EUR NACE D35.11 <NA> NA UP132 ultima… power power Leon… Leone-Le… 1 loanb… FALSE
#> # ℹ 316 more rows
match_name()
defaults to scoring matches between name
strings that belong to the same sector. This implies that a potential
match will only be shown if the names in the loanbook and the abcd are
sufficiently similar AND the loan is classified in the same sector as
the company activity in the abcd. Using by_sector = FALSE
removes this limitation – increasing computation time, and the number of
potentially incorrect matches to manually validate. In most cases, it is
recommended to keep by_sector = TRUE
. However, lifting the
restriction can be helpful in cases where the sector classification in
the loanbook is not reliable or no sector classification is available at
all. Below you can see that the removal of the restriction increases the
number of potential matches.
match_name(loanbook, abcd, by_sector = FALSE) %>% nrow()
#> [1] 656
# Compare to
match_name(loanbook, abcd, by_sector = TRUE) %>% nrow()
#> [1] 326
The min_score
argument allows you to set a minimum
threshold score. This is useful when you either want to filter out
potential matches that are not precise enough, or want to add additional
potential matches because the given suggestions do not seem to provide
sufficient matching coverage. The default value is 0.8, but you can set
it to any value between 0 and 1. Since a higher score implies a closer
match between company names in the loanbook and the abcd, fewer
potential matches will usually be shown. The following code shows how to
set the threshold to 0.9:
After the initial calculation of scores for potential matches, you will manually have to inspect the suggested matches and decide which ones to keep or remove. This is especially important when using text-based matching because there is no guarantee that similar company names as identified by the algorithm actually refer to the same companies. The manual validation step is therefore crucial, it ensures data quality in the remainder of the analysis.
The validation process is not automated and requires some time and effort on your part. This is often the most time-consuming part of the analysis, especially when trying to improve the matching coverage in multiple iterations.
Below you will find a brief description of the manual validation process:
Write the output of match_name()
into a .csv file, for
example using {readr}
to save a CSV file:
Compare, edit, and save the data manually:
matched.csv
with any spreadsheet editor (Excel,
Google Sheets, etc.).name
and name_abcd
manually to determine if the match is valid. Other information can be
used in conjunction with just the names to ensure the two entities match
(sector, internal information on the company structure, etc.)score
value to
1
.score
value to anything
other than 1
.valid_matches.csv
.Re-import the edited file (validated), for example using
{readr}
to read a CSV file:
valid_matches <- readr::read_csv(file = file.path(tempdir(), "valid_matches.csv"), show_col_types = FALSE)
You can find more detailed information about the matching process under the header “PACTA for Banks Training Webinar 2” in the training materials section of the PACTA for Banks website and in the corresponding slide deck.
The validated dataset may have multiple matches per loan. Consider
the case where a loan is given to “Acme Power USA”, a subsidiary of
“Acme Power Co.”. There may be both “Acme Power USA” and “Acme Power
Co.” in the abcd
, and so there could be two valid matches
for this loan. To get the best match only, use prioritize()
– it picks rows where score
is 1 and level
per
loan is of highest priority
:
some_interesting_columns <- c("id_2dii", "level", "score")
prioritized_matches <-
valid_matches %>%
prioritize()
prioritized_matches %>% select(all_of(some_interesting_columns))
#> # A tibble: 177 × 3
#> id_2dii level score
#> <chr> <chr> <dbl>
#> 1 DL129 direct_loantaker 1
#> 2 DL144 direct_loantaker 1
#> 3 DL270 direct_loantaker 1
#> 4 DL86 direct_loantaker 1
#> 5 DL5 direct_loantaker 1
#> 6 DL80 direct_loantaker 1
#> 7 DL150 direct_loantaker 1
#> 8 DL3 direct_loantaker 1
#> 9 DL65 direct_loantaker 1
#> 10 DL79 direct_loantaker 1
#> # ℹ 167 more rows
By default, highest priority refers to the most granular match
(direct_loantaker
). The default priority is set internally
via prioritize_level()
.
You may use a different priority. One way to do that is to pass a
function to priority
. For example, use rev()
to reverse the default priority.
valid_matches %>%
prioritize(priority = rev) %>%
select(all_of(some_interesting_columns))
#> # A tibble: 177 × 3
#> id_2dii level score
#> <chr> <chr> <dbl>
#> 1 UP190 ultimate_parent 1
#> 2 UP101 ultimate_parent 1
#> 3 UP39 ultimate_parent 1
#> 4 UP63 ultimate_parent 1
#> 5 UP224 ultimate_parent 1
#> 6 UP132 ultimate_parent 1
#> 7 UP12 ultimate_parent 1
#> 8 UP20 ultimate_parent 1
#> 9 UP134 ultimate_parent 1
#> 10 UP127 ultimate_parent 1
#> # ℹ 167 more rows
borderline
FlagMatches are preferred when the sector from the loanbook
matches the sector from the abcd
. The loanbook
sector is determined internally by mapping the sector classification
code from the loan book to a PACTA sector, using the
sector_classification_system
and
sector_classification_direct_loantaker
columns. Currently,
the following options are supported for
sector_classification_system
, specified in
sector_classifications
:
GICS, ISIC, NACE, NAICS, PSIC, SIC
borderline
flagAn issue arises when, for example, a company is classified in the
“power transmission” sector. In a perfect world, these companies would
produce no electricity, and we would not try to match them. In practice,
however, we find there is often overlap, as companies may have
integrated business models that span multiple steps of the supply chain
or sector classification systems do not break down every code clearly
enough to map it to PACTA sectors unambiguously. For this reason, we
introduced the borderline
flag.
In the example below, we see two classification codes coming from the NACE classification standard:
nace_classification %>%
filter(code %in% c("D35.11", "D35.14", "D35.1"))
#> # A tibble: 3 × 6
#> original_code description code sector borderline version
#> <chr> <chr> <chr> <chr> <lgl> <chr>
#> 1 35.1 35.1 Electric power generation,… D35.1 power TRUE 2.1
#> 2 35.11 35.11 Production of electricity… D35.… power FALSE 2.1
#> 3 35.14 35.14 Distribution of electrici… D35.… power TRUE 2.1
Notice that the code D35.11 corresponds to power generation. This is
an identical match to power
sector in PACTA, and thus the
borderline
flag is set to FALSE
. In contrast,
code D35.14 corresponds to the distribution of electricity. In a perfect
world, we would set this code to not in scope
, however
there is still a chance that these companies produce electricity. For
this reason, we have mapped it to power
with
borderline = TRUE
. Finally, code D35.1 corresponds to the
wider electricity sector, covering “Electric power generation,
transmission and distribution”. This is a case where the granularity is
insufficient to unambiguously match the code to the power
sector as defined in PACTA, and thus the borderline
flag is
set to TRUE
.
In practice, if a company has a sector code with
borderline
is TRUE
and is matched,
then consider the company in scope. If it has a borderline
value TRUE
and is not matched, then consider it
out of scope.
If you can obtain sector codes in any one of those classification systems for your loan book, it is strongly recommended using them. If you do not have your loans mapped to any of these systems, you can either run the matching without a sector classification (see above), or you can use a custom sector classification system.
If you want to use your own custom sector classification system, you
will need to create a file that follows the same structure as the sector
classification files in sector_classifications
, that is,
the file needs to contain the following columns (of the following
types):
sector
: <character>borderline
: <logical>code
: <character>code_system
: <character>Please notice that when manually preparing a custom sector
classification file, you need to decide for every code of your mapping
system which PACTA sector the code maps to. This requires the system to
be relatively granular to capture the nuances of the supply chains of
the companies in your loan book. As discussed above, any sector code
that is not unambiguously in scope or out of scope for a given PACTA
sector should be marked TRUE
in the borderline
column. Any sector code that is clearly not in scope should be marked
FALSE
and receive the sector value
not in scope
. Any sector code that is clearly in scope
should be marked FALSE
for borderline
and
receive the PACTA sector value that it maps to in the
sector
column.
ADD EXAMPLE
{pacta.loanbook}
allows you to match loans from your
loanbook to the companies in an asset-based company dataset. However,
matching every loan is unlikely – some loan-taking companies may be
missing from the asset-based company dataset, or they may not operate in
the sectors PACTA focuses on (power, cement, oil and gas, aviation,
coal, automotive, steel, and hdv). Thus, you may want to measure how
much of the loanbook matched some asset. This article shows two ways to
calculate such matching coverage:
Calculate the portion of your loanbook
covered, by
dollar value (i.e. using one of the loan_size_*
columns).
Count the number of companies matched.
First we will need to load the additional useful package
{ggplot2}
:
We will use the example datasets created in the previous section. To
demonstrate our point, we create a loanbook
dataset with
two mismatching loans:
loanbook <- loanbook %>%
mutate(
name_ultimate_parent =
ifelse(id_loan == "L1", "unmatched company name", name_ultimate_parent),
sector_classification_direct_loantaker =
ifelse(id_loan == "L2", "99", sector_classification_direct_loantaker)
)
We will then run the matching algorithm on this loanbook:
matched <- loanbook %>%
match_name(abcd) %>%
prioritize()
#> Warning: Some `sector_classification_direct_loantaker` are unknown:99
#> ℹ If you wish to request a new sector classification system, please create an issue here:
#> • https://github.com/RMI-PACTA/r2dii.data/issues
#> ℹ Or contact the PACTA team here:
#> • [email protected]
Note that this matched
dataset will contain
only loans that were matched successfully. To determine
coverage, we need to go back to the original loanbook
dataset. We must determine the PACTA sectors of each loan, as dictated
by the sector_classification_direct_loantaker
column.
For this, we join the loanbook
with the
sector_classifications
dataset, which lists all sector
classification code standards used by ‘PACTA’. Unfortunately we need to
work around two caveats (you may ignore them because they are
conceptually uninteresting):
In the two datasets, the columns we want to merge by have
different names. We use the argument by
of
dplyr::left_join()
to merge the columns
sector_classification_system
and
sector_classification_direct_loantaker
(from
loanbook
) with the columns code_system
and
code
(from sector_classifications
),
respectively.
In the two datasets, the sector classification codes are
represented with different data-types. We modify the column
sector_classification_direct_loantaker
before using
dplyr::left_join()
so it has the same type as the
corresponding column code
(otherwise
dplyr::left_join()
throws an error), and again after
dplyr::left_join()
to restore its original type.
merge_by <- c(
sector_classification_system = "code_system",
sector_classification_direct_loantaker = "code"
)
loanbook_with_sectors <- loanbook %>%
left_join(sector_classifications, by = merge_by) %>%
mutate(sector_classification_direct_loantaker = as.character(sector_classification_direct_loantaker))
We can join these two datasets together, to generate our
coverage
dataset:
coverage <- left_join(loanbook_with_sectors, matched) %>%
mutate(
loan_size_outstanding = as.numeric(loan_size_outstanding),
loan_size_credit_limit = as.numeric(loan_size_credit_limit),
matched = case_when(
score == 1 ~ "Matched",
is.na(score) ~ "Not Matched",
TRUE ~ "Not Matched"
),
sector = case_when(
borderline == TRUE & matched == "Not Matched" ~ "not in scope",
TRUE ~ sector
)
)
#> Joining with `by = join_by(id_loan, id_direct_loantaker, name_direct_loantaker,
#> id_ultimate_parent, name_ultimate_parent, loan_size_outstanding,
#> loan_size_outstanding_currency, loan_size_credit_limit,
#> loan_size_credit_limit_currency, sector_classification_system,
#> sector_classification_direct_loantaker, lei_direct_loantaker,
#> isin_direct_loantaker, sector, borderline)`
From the coverage
dataset, we can calculate the total
loanbook coverage by dollar value. Let’s create two helper functions,
one to calculate dollar-value and another one to plot coverage in
general.
dollar_value <- function(data, ...) {
data %>%
summarize(loan_size_outstanding = sum(loan_size_outstanding), .by = c(matched, ...))
}
plot_coverage <- function(data, x, y) {
ggplot(data) +
geom_col(aes({{x}}, {{y}}, fill = matched)) +
# Use more horizontal space -- avoids overlap on x axis text
theme(legend.position = "top")
}
Let’s first explore all loans.
To calculate the total, in-scope, loanbook coverage:
coverage %>%
filter(sector != "not in scope") %>%
dollar_value() %>%
plot_coverage(matched, loan_size_outstanding)
You may break-down the plot by sector:
Or even further, by matching level:
coverage %>%
mutate(matched = case_when(
matched == "Matched" & level == "direct_loantaker" ~ "Matched DL",
matched == "Matched" & level == "intermediate_parent_1" ~ "Matched IP1",
matched == "Matched" & level == "ultimate_parent" ~ "Matched UP",
matched == "Not Matched" ~ "Not Matched",
TRUE ~ "Catch unknown"
)) %>%
dollar_value(sector) %>%
plot_coverage(sector, loan_size_outstanding)
You might also be interested in knowing how many companies in your
loanbook were matched. It probably makes most sense to do this at the
direct_loantaker
level:
companies_matched <- coverage %>%
summarize(no_companies = n_distinct(name_direct_loantaker), .by = c(sector, matched))
companies_matched %>%
plot_coverage(sector, no_companies)
A loan is considered misclassified, if it is identified as in-scope of a PACTA sector based on the sector classification in the input loan book, but if it turns out that this classification is incorrect based on the counterparty’s activities. For example, if a counterparty is classified as a fully in-scope power company, but in reality it is a power distribution company or even active in another sector altogether, such as upstream gas production, the counterparty is misclassified. Misclassification is a problem, because it leads to exaggeratedly low match success rates, that cannot be improved by iterating the matching process. You can search as long as you want, but a company that in reality does not operate any power plants will not be found in the ABCD regardless of how the settings of the matching algorithm are tweaked. The match success rate can only be a useful metric if the denominator, that is the classification in the loan book, is solid. This is also important to understand which loans are truly not matched. This category should only be companies that actually are active in a given in-scope sector, but were not found in the ABCD.
Identifying misclassified loans is usually a manual process, where counterparties that have not been matched are researched to determine if they should be in scope or not. Depending on the size of the loan book, this can be a time-consuming process. It is recommended to start with the unmatched counterparties with the largest loans, as these have the largest impact on the match success rate. You may then have to decide for yourself, if you want to research all unmatched loans or set a cutoff point, e.g. based on the size of the loan.
Once the list of misclassified loans is identified, there are two broad options to appropriately these cases.
"not in scope"
, it will not
inflate the denominator of the match success rate anymore.loans_to_remove.csv
, which should
include the column id_loan
to indicate the precise
misclassified loan that was identified. This loan then needs to be
explicitly removed from the match success calculation, as shown
below.Let’s assume the the loan with id L255 is falsely classified as a steel company. Research shows this company is not a primary steel producer, but it makes steel-based products for end use. Hence it is active downstream in the value chain and cannot be matched to the ABCD. If we cannot correct the input sector classification, we instead create a csv file with the id of the misclassified loan. In practice, this can be done programmatically or manually.
readr::write_csv(
x = tibble(id_loan = "L255"),
file = file.path(tempdir(), "loans_to_remove.csv")
)
The loans_to_remove.csv file should have just one column, id_loan, and therefore be structured as follows:
id_loan
: <character>We then read in the file and remove the misclassified loan from the match success rate calculation. We compare the coverage before and after the removal of the misclassified loan to see the impact of the correction. For demonstration purposes, we artificially increase the loan_size_outstanding of the affected counterparty while focusing on the steel sector only, to see the clear impact.
coverage_compare <- coverage %>%
mutate(
loan_size_outstanding = ifelse(id_loan == "L255", 50000000, loan_size_outstanding)
) %>%
filter(sector == "steel")
coverage_compare %>%
mutate(matched = case_when(
matched == "Matched" & level == "direct_loantaker" ~ "Matched DL",
matched == "Matched" & level == "intermediate_parent_1" ~ "Matched IP1",
matched == "Matched" & level == "ultimate_parent" ~ "Matched UP",
matched == "Not Matched" ~ "Not Matched",
TRUE ~ "Catch unknown"
)) %>%
dollar_value(sector) %>%
plot_coverage(sector, loan_size_outstanding)
loans_to_remove <- readr::read_csv(file.path(tempdir(), "loans_to_remove.csv"))
coverage_corrected <- coverage_compare %>%
anti_join(loans_to_remove, by = "id_loan")
coverage_corrected %>%
mutate(matched = case_when(
matched == "Matched" & level == "direct_loantaker" ~ "Matched DL",
matched == "Matched" & level == "intermediate_parent_1" ~ "Matched IP1",
matched == "Matched" & level == "ultimate_parent" ~ "Matched UP",
matched == "Not Matched" ~ "Not Matched",
TRUE ~ "Catch unknown"
)) %>%
dollar_value(sector) %>%
plot_coverage(sector, loan_size_outstanding)
The reason why it is a good idea to either correct misclassified loans or disregard them in the calculation of the match success rate is that a misclassified loan cannot possibly be matched in a given sector. Therefore, no amount of work would be sufficient to improve the sector match success rate, because it is calculated against an incorrect baseline. Technically, the user is not forced to correct misclassifications, and there may be a limit to how much time should be spent on this, but it is recommended to at least correct large misclassified loans.
…
You can calculate scenario targets using two different approaches: Market Share Approach, or Sectoral Decarbonization Approach.
The Sectoral
Decarbonization Approach is used to calculate scenario targets for
the emission_factor
of a sector. For example, you can use
this approach to set targets for the average emission factor of the
cement sector. This approach is recommended for sectors lacking
technology roadmaps.
# Use this dataset to practice but eventually you should use your own data.
co2 <- co2_intensity_scenario_demo
sda_targets <-
target_sda(
data = prioritized_matches,
abcd = abcd,
co2_intensity_scenario = co2,
region_isos = regions
) %>%
filter(sector == "cement", year >= 2020)
#> Warning: Removing rows in abcd where `emission_factor` is NA
sda_targets
#> # A tibble: 110 × 6
#> sector year region scenario_source emission_factor_metric emission_factor_value
#> <chr> <dbl> <chr> <chr> <chr> <dbl>
#> 1 cement 2020 advanced economies demo_2020 projected 0.0230
#> 2 cement 2020 developing asia demo_2020 projected 0.0595
#> 3 cement 2020 global demo_2020 projected 0.664
#> 4 cement 2021 advanced economies demo_2020 projected 0.0232
#> 5 cement 2021 developing asia demo_2020 projected 0.0594
#> 6 cement 2021 global demo_2020 projected 0.665
#> 7 cement 2022 advanced economies demo_2020 projected 0.0233
#> 8 cement 2022 developing asia demo_2020 projected 0.0592
#> 9 cement 2022 global demo_2020 projected 0.666
#> 10 cement 2023 advanced economies demo_2020 projected 0.0235
#> # ℹ 100 more rows
Or at the company level:
sda_targets_company <-
target_sda(
data = prioritized_matches,
abcd = abcd,
co2_intensity_scenario = co2,
region_isos = regions,
by_company = TRUE # Output results at company-level
)
#> Warning: Removing rows in abcd where `emission_factor` is NA
sda_targets_company
#> # A tibble: 966 × 7
#> sector year region scenario_source name_abcd emission_factor_metric emission_factor_value
#> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 cement 2020 advanced economies demo_2020 Milani, Milani e Milani SPA projected 0.714
#> 2 cement 2020 developing asia demo_2020 Heidrich GmbH projected 0.675
#> 3 cement 2020 developing asia demo_2020 Hovel Hovel GmbH & Co. KGaA projected 0.622
#> 4 cement 2020 global demo_2020 Battaglia-Battaglia Group projected 0.626
#> 5 cement 2020 global demo_2020 Bergstrom, Bergstrom and Bergstrom projected 0.607
#> 6 cement 2020 global demo_2020 Carsten GmbH & Co. KG projected 0.625
#> 7 cement 2020 global demo_2020 Crist-Crist projected 0.655
#> 8 cement 2020 global demo_2020 Durgan-Durgan projected 0.702
#> 9 cement 2020 global demo_2020 Eberhardt Eberhardt GmbH projected 0.685
#> 10 cement 2020 global demo_2020 Geissler Geissler AG & Co. KGaA projected 0.637
#> # ℹ 956 more rows
These results can be saved/written to a CSV file for potential input to other models and software, for example:
readr::write_csv(x = sda_targets, file = file.path(tempdir(), "sda_targets.csv"))
readr::write_csv(x = sda_targets_company, file = file.path(tempdir(), "sda_targets_company.csv"))
Remember to replace tempdir()
with the path to the
directory that you wish to save your files in.
There are a large variety of possible visualizations stemming from
the outputs of target_market_share()
and
target_sda()
. Below, we highlight a couple of common plots
that can easily be created using the {pacta.loanbook}
package.
From the SDA output, we can compare the projected average emission intensity attributed to the portfolio, with the actual emission intensity scenario, and the scenario compliant SDA pathway that the portfolio must follow to achieve the scenario ambition by 2050.
ADD INSTRUCTIONS Write to file
PREVIOUS CHAPTER: Preparatory Steps
NEXT CHAPTER: Interpretation of Results