Mathematische Statistik

This entry is a review of the following book.

Mathematische Statistik
Claudia Czado, Thorsten Schmidt
ISBN-13: 9783642172601
280 Seiten
google reader,,

You can get a physical copy or an e-book at a cheaper price at (The reduction requires an account on the site, but I am not sure that it is enough for reduction.)

The reason why I read the book

As you have already guess, I read it to learn German.

In fact I tried at first papers which are written in German. But I realised immediately that it is not efficient. The German for a mathematics paper is so clear, that I need to spend longer time to understand mathematics than German.

So I tried a book on undergraduate mathematics instead. I chose a book about complex analysis, because it should contain many mathematical terms in various areas. But this idea failed as well, because the book is too easy for me. Namely it is boring.

So I decided to read a book about a theory which has a slightly different flavour from my specialization. That was statistics. I learned measure theory and probability theory, but I did not know so much about statistics. (Of course it is important for me that statistics is basics of machine learning.)

There were several choices, but the first impression of this book was very good.

The chapters of the book

The book consists of 7 chapters and the first chapter is for probability theory and the rest is for mathematics of statistics. As the title suggests, this book deals with mathematical details of statistics rather than practical statistics. The book is so compact, that it does not take a long time to read it through.

Chapter 1

This chapter explains (very) briefly fundamental facts about probability theory and some probability density functions. This book does not assume that the reader is familiar with measure theory. Does this sound good?

It is often a challenge to explain a theory T without another theory S if T is strongly based on S. One of the important assumptions to achieve this is: do not assume that the reader can gain deep understanding. That is because it requires the theory S.

Here is an example: Show the linearity of the expected value. We can not use measure theory, so we are able to define the expected value of a random variable only for two cases: a discrete random variable (its image is at most countable) and a continuous random variable (there exists its pdf). So we need to prove that the linear combination of such variables is also discrete or continuous and this is ... If we understand measure theory, the linearity of the expected value is obvious.

Probably nobody cares about this problem, but I do, because I am interested in a minimal path to understand things. In my opinion it is most important to understand what the theory is based on.

  1. Based on probability theory itself: concentrate only on notation which will be used in the book, so there should be no proofs. (Maybe §1.4 is a good example, even though it lacks examples.)
  2. Based on measure theory: introduce the necessary definitions and theorems without any proofs. For example, the key concept of measure theory in probability theory is the push-forward (Bildmaß). So give the theorem of the existence of the push-forward with a reference without any proof and use it.

I prefer the second option, but the first option can keep the book elementary. (Regarding the first option I recommend "Maß- und Integrationstheorie".)

Chapter 2–6

These chapters deal with mathematical theory of statistics: statistical models, estimators, confident intervals and hypothesis tests. These chapters are very good, but there are several points which could become better.

  • A sample space (ein Zustandraum) should be rigorously defined.
  • Definition 2.1 is useless because it is not used in the book. (Moreover the letter L on the equal sign should be explained.)
  • More Bayesian theory. (This is just my wish.)
  • An easy conclusion of calculus should be omitted.
  • The explanation of p-values is unclear. (The most important advantage of the p-value is that we do not need to care about the test function which is used.)

Chapter 7

We discuss (linear) regression in this chapter. In my opinion one of the biggest advantage of linear regression is that we can look at a statistical statement as a statement of linear algebra. However I felt that the book does not make use of this advantage. I guess that the book assumes the reader to be familiar with linear algebra, so the description relating linear algebra can become more concise and clear.

By the way, the explanation of W_0 in §7.3 is wrong.


I recommend this book for any person who has already learned probability theory and to skip Chapter 1. Then you can quickly learn theoretical basics of statistics.

Share this page on        
Categories: #data-mining