simpleperf: fix stat scale output.

Currently simpleperf stat command create an event file for each cpu and
scale the result by summarizing counters on each cpu. But one thread only
runs on one cpu at a time, so it results in wrongly scaled numbers.

Fix this by three changes:
1. For non system-wide stat, Create only one event file for all cpus.
2. When summarizing counters, omit counters having 0 running time.
3. Print real value instead of scaled value.

Run command:
$simpleperf stat ./empty_program

Before the change:
Performance counter statistics:

     33,540,176  cpu-cycles                # 54.812986 GHz                     (2%)
     28,233,348  stalled-cycles-frontend   # 46.140 G/sec                      (2%)

After the change:
Performance counter statistics:

       625,335  cpu-cycles                # 1.404496 GHz                      (100%)
       507,200  stalled-cycles-frontend   # 1.139 G/sec                       (100%)

Change-Id: I76bc3e220df4f149ab365e960295b24fde8ae2fc
2 files changed