Suggestions on Expansion and Modification of Soundscape Music-likeness Indicator: Study on music 1/f ^γBehavior of two Cultures

G B Chen, Y Z Zhao

State Key Laboratory of Subtropical Building Science, School of Architecture, South China University of Technology, No. 381, Wushan Road, Tianhe District, Guangzhou, Guangdong Province, China

Summary: Music-likeness is an indicator evaluating environmental soundscape temporal structure properties, in which fuzzy set membership functions are constructed on the basis of 1/f ^γ behavior of selected music fragments. The current study analyzes temporal dynamics in intensity and pitch vibrational sequences of Western Classical Music and Chinese National Music by linear-fitting their fluctuation spectra. The fitting for each sample is carried out in low and high frequency regions, respectively, obtaining totally eight parameters of slope and deviation. Data comparison and statistical results show that temporal structures of these music fragments of are tending to 1/f behavior on a whole, but the probabilistic distributions of several parameters have significant difference between the two cultures. Finally, two suggestions on expansion and modification of music-likeness are made on the basis of analytical results, and fuzzy set membership functions are also updated correspondingly.

Key words: 1/f ^γbehavior; temporal structure; Western Classical Music; Chinese National Music; music-likeness

1. Introduction

1/f noise is firstly measured in the spectrum of thermionic tube by J. B. Johnson in 1925, namely “flicker noise” [1], and later also found in many other studies [2]. Mathematically, 1/f noise belongs to nonstationary random process whose mechanism has been explained by various of models, such as simple exponent relaxtion law [3], self-organized criticality (SOC) [4], and theory of fractals [5] etc. Nevertheless, no real universality has been generated for different systems in which 1/f behaviors emerge.

Studies conducted by Voss and Clarke in 1970’s [6, 7] observed that in music and speech signals approximate 1/f behaviors both emerged in the spectra of intensity and pitch fluctuation. They conluded in general the power law 1/f ^γin temporal dynamics of common music signals S(f )∝1/f ^γ, where γ∈(0, 2). On log-log scale we have logS(f )∝-γ ∙log f, a linear relation with slope of -γ. When γ→0 (white noise) and γ→2 (brown noise) the corresponding pieces of music will be percepted “too chaotic/too unpredictable” and “boring/dull”, respectively, whereas 1/f music with some but not strong correlation over all time, is judged far more interesting than the two mentioned above. Jeong et al [8] verified the perception of music of different temporal structure, showing the pieces with both 1/f melody and rhythm sound more pleasing than others, and human brains reveal decreased chaotic electrophysiological behavior by eletroencephalogram analysis. Later researches indicate that the values of γ strongly correlate with the types of music [9, 10, 11], “classical” and “new age” are usually closer to 1/f behavior than “rock&roll”, for example, implying the power law 1/f ^γmay be available for music clustering.

In 2003, Coensel et al [12] followed Voss and Clarke’s method and analyzed rural and urban soundscape recordings, finding 1/f behaviors in A-weighted sound pressure, Zwicker loudness and instantaneous zero transitions (pitch) fluctuation spectra. This phenomenon, as they explained, would originates from complexity emerging universally in physical environment. According to the theory of complex system, 1/f related factors are expected available in sonic environment generated from ecosystems, natural and human social systems etc. Recent studies on related aspects, however, are just scattered through a few literatures, foucusing only on individual kind of sound, such as wind blowing [13, 14], human agglomeration [15] and traffic flow [16, 17] etc. Not surprisingly, environmental sounds don’t always obey 1/f behavior strictly and their temporal properties usually vary under different conditions. For example, induced equally by air flow, intrinsic turbulence appears approximate 1/f behavior, while wind induced vegetation sound 1/f ² [18]; road traffic noise shows 1/f behavior only when the traffic flow is near saturation [16], a sparse flow leads to γ→0, for vehicles’ passing too randomly [19], and a traffic jam, conversely, to γ→2 [20].

Therefore, it can be estimated that temporal structure of sonic environment (soundscape) should also obey power law 1/f ^γjust as music does, and the problem in the following lies in formulating a metric system to describe it. In [12, 21] a single value indicator, Music-likeness (ML) is developed to quantitatively measure how much a given audio signal (including soundscape concerned here) resembles music in temporal aspect. ML is proposed on the basis of linear-fitting results of spectra of several music fragments, constructing fuzzy set membership functions (FSMF) of fitting slope μ_A(α) (α=-γ) and deviation μ_E(ε), respectively. The product of them obtains then the value of ML:

(1)

where μ_A(α) and μ_E(ε)∈(0, 1), and then ML∈(0, 1). The closer ML is to 1, the higher the likeness degree between the given signal and music is.

As a matter of fact, the music-soundscape analogy as ML indicates above, has already been advocated by R. M. Schafer, the father of acoustic ecology, in his publication that [22]:

“Throughout this book I am going to treat the world as a macrocosmic musical composition.”;

“There can be little doubt then that music is an indicator of the age……For some time I have also believed that the general acoustic environment of a society can be read as an indicator of social conditions which produce it……”;

“……the audible metrical divisions of heart, breath and foot, as well as the conservational actions of the nervous system, must be our guide against which we arrange all the other fortuitous rhythms of the environment around us……The environment contains many sets of rhythms……Though these may not provide audible pulsations, they do have powerful implications for the changing soundscape.”

From Schafer’s point of view, rhythmic and melodic elements existing in physical environment (including soundscape) coincide with each other, therefore researchers are able to obverse the listeners’ general impression on sonic dynamics on a certain range of time scale via their concrete semantic descriptions. A great quantity of soundscape field surveys and laboratory listening test on environmental sounds in the past decade, such as Raimbault et, al [23], Axelsson et, al [24], William et, al [25], Kawaii et, al [26], Zeilter and Hellbrück [27] etc, have been conducted and the results statistically show that principal components or major factors associated with “dynamics”, “temporal change” and similar subject indices such as “vibrancy”, “eventfulness” and “activity”, explain second to fourth largest portion of the total variance after those with “pleasantness” and “calm”, denoting the considerable status of temporal aspect on soundscape evaluation.

The proposal of ML could be a specific application to Schafer’s methodology of music-soundscape analogy, but assessment is required to test its effectiveness and reliability by investigations or detections. Coensel [21] has done preliminary research on this topic by conducting a field survey which asked people passing-by to illustrate the feeling of tempo in ten urban soundscapes. The result shows some correlation between ML1 (ML in low frequency region) and subject index “music-like”. However, deviations at the data appear considerably large for several factors such as individual interest, sonic-environmental change and sample size.

[28] and [29] propose a multi-criteria assessment for quiet rural soundscape including ML and other five indicators: pleasantness evaluation based on semantic differential method, non-fitting sounds, L_A50, noise event count N_cn (the number of sound events that exceed L_A50with 3dBA) and log₁₀G (where G is centre of gravity of spectrum). This set of criteria not only could play a role of a guide line for soundscape design and creation, but also facilitate environmental monitoring and policy planning.

The task of current study is to calculate the temporal structure of Western Classical Music (WCM) and Chinese National Music (CNM), compare their respective results, make statistical analysis, and conclude the similarities and differences between them. Voss and Clarke [7] have already predicted individual features of music and speech of different cultures, therefore we expect in this work to take music samples selection of ML into consideration more carefully, discussing the necessity for samples of multi-culture backgrounds. Additionally, we will be concerned about 1/f ^γ behaviors of both intensity and pitch variations for they are two of the fundamental and major elements of sound and may influence the auditory impression directly. But currently pitch has not yet been incorporated into the assessment system of ML.

The content of each section in this paper is arranged as follows. In Section 2 relative information on calculation is given including music fragments selection with their corresponding backgrounds introduction, and the procedure for 1/f ^γ behavior determination of the fragments. The set of parameters obtained in Section 2 are arranged in Section 3 to distributional patterns, which will be then statistically analyzed and discussed. This section focuses on distributional characteristics of parameters because of the great dependence of FSMF in ML on them. In Section 4 suggestions on expansion and modification of ML are made on the basis of discussion in Section 3, and then the updated FSMF will be constructed by fitting the distributional patterns of intensity and pitch, respectively.

2. Methodology

Music Samples

We select totally 120 music fragments, including 61 pieces of WCM and 59 of CNM respectively, as samples for the study. They are stored as PCM files with 16bit quantization, 44.1kHz sampling rate and duration of roughly 2-5min. Table 1 shows repertoires or genres of the two types of samples.

Table 1 Left: repertoires from which WCM samples are extracted with sample sizes of their own; right: genres of CNM samples with sample sizes of their own

WCM		CNM
Adagio for Strings - S. Barber	4	Traditional instrument (ancient tunes, dance etc)	37
Appalachian Spring - A. Copland	8	Opera (Beijing, Shaanxi and Kunqu)	22
Brandenburg Concerto No.1 - J. S. Bach	3
Orchestral Suite No. 2 - J. S. Bach	7
Orchestral Suite No. 3 - J. S. Bach	5
Piano Concerto No.2 - S. Rachmaninov	10
Requiem - W. A. Mozart	13
Four Seasons - A. Vivaldi	11
	61		59

The fragments of WCM, on one hand, are extracted from the movements of the cycles and suites used in [21] (see its Appendix A), including genres such as concertos (Piano Concerto No.2 in C Minor, Op. 18-Rachmaninov), orchestra (Suite for Orchestra No.2 in B minor, BWV 1067-Bach) and mass (Requiem-Mozart). The times of the compositions cover all the three eras of the history of Classical Music from 17^th to 20^th Century, namely Baroque, Classicalism and Romanticism. They are mainly instrumental ensembles by brasses, woodwinds, strings and percussions, except that some parts of Requiem have additional vocal solo or chorus.

The fragments of CNM, on the other hand, contain two major genres: traditional instrument and Chinese Opera. The selected repertoires in the former are famous ancient tunes such as A Moonlit Night on the Spring River, The Moon Reflected in Two Springs, Guangling Tune with zither, lute or urheen solo performance, and some other ensemble samples, Qin’s Dance of Cavalry, for example. And the latter contains some pieces of Beijing, Shaanxi and Kunqu Opera dominated by vocal solo with accompaniment. CNM has been inherited and developed for hundreds or even thousands of years, strongly representing the spirit of Chinese traditional culture. The ancient tune compositions, for example, relate closely to metres of poetry word, and their rhythm and melody would highly reflect speech characteristics of Chinese.

Calculation Procedure

We use MATLAB software as a tool for signal analysis and mathematical calculation as follows.

1/f ^γ behaviors as observed in [6] and [7], are not figured out on the basis of amplitudes of original signals but those of short-time variant vibrations (fluctuations). Firstly, discrete sequences of short-time intensity and pitch are both determined at 100Hz sampling (corresponding interval of 10ms). Relative (absolute value is unknown) A-weighted level L_Aeqis employed to describe intensity variation. The signal passes an A-weighting filter designed by frequency sampling of FIR first, and the equivalent level of each interval is computed next. Pitch indicator is defined as log₁₀P, where P stands for “Correlogram-based Pitch Measures” in Slaney’s MATLAB toolbox for auditory modeling work [30], with the unit Hz. Unlike pervious studies [7, 12] that employed zero transitions Z, we believe that log₁₀P would be more accurate in pitch estimation.

Secondly, DC component in the sequence is removed and then Fourier Transform X(f ) is carried out so that power spectrum can be derivated as:

(2)

where X stands for either of the variant (L_Aeq or log₁₀P), and “*” complex conjugate of X(f ).

Thirdly, the coordinates are transformed to log-log scale, and the spectrum is then processed to 12 points/Oct [21]. Interpolation must be used for low-frequency region I₁=[0.017Hz, 0.2Hz] (time scale I’₁=[5s, 60s]) where the interval is larger than 1/12 Oct, while for high-frequency region I₂=[0.2Hz, 5Hz] (time scale I’₂=[0.2s, 5s]) where the interval is small enough, smoothness is done by averaging energy of all the points within each 1/12 Oct instead. The cut-off point 5s between I’₁ and I’₂ corresponds to the length of a phrase in musicology [21].

Finally, linear-fittings for the spectra of intensity and pitch vibration in I₁ and I₂figure out the values of slopes α_L₁、α_L₂、α_P₁、α_P₂and RMS deviations ε_L₁、ε_L₂、ε_P₁、ε_P₂, respectively, where the subscript L and P stand for L_Aeq and log₁₀P, while 1 and 2 for I₁ and I₂.

3. Results

3.1 Probabilistic Distributions

Figure 1 Histograms for probabilistic distributions of spectral fitting of WCM and CNM fragments, and each subplot shows the result of one single parameter: (a) α_L₁; (b) α_P₁; (c) α_L₂; (d) α_P₂; (e) ε_L₁; (f) ε_P₁; (g) ε_L₂ and (h) ε_P₂

The construction of FSMF in ML is based on probabilistic distributions as introduced in Section 1, thus we focus in this section on discussing the properties of the whole but not the result of each signal. Histograms in Figure 1 show the distributional patterns from data of all the 120 music fragments. The values of slopes and deviations are scattering in the ranges of (-2, 0) and (0.10, 0.60), which are devided into 11 subranges with intervals of 0.2 and 0.05, respectively. The number of samples in each subrange will be counted up then. Different colour bars (black and white) distinguish the two types of music in each subplot for comparison in next step.

First of all, we find in Figure 1 (a)~(d) that the mass of slope data is centered mainly at the range close to -1 and the numbers decrease towards both sides, for both WCM and CNM, both I₁ and I₂ region and both L_Aeq and log₁₀P variation. This result implies a tendency for the rhythm and melody of the selected music fragments to 1/f behavior. Two previous literatures have found self-similarity or fractal geometry in the incidence of frequency intervals of Bach’s and Mozart’s compositions [31, 32], and later in those of ancient zither tunes of CNM [33], which preliminarily explain the origin of 1/f behavior in pitch variation of some masterpieces, and coincide well with the result of log₁₀P in current study. But there hasn’t been any verification for L_Aeq variation yet.

It’s also obvious that probabilistic distributions of slope in I₂ are much more concentrated than those in I₁. Decrease tendency in I₂ appears monotonical, while those in I₁ are relatively irregular. These distributional characterisitcs can be observed both in intensity and pitch variation, with standard deviations in I₁of 0.51 and 0.40, and in I₂ 0.25 and 0.19, respectively. Therefore, it turns out to be more reasonable to fit the spectrum in low and high frequency region respectively according to our work, because fitting the whole spectrum would obliterate the distinction between the distributions in the two regions.

On the other hand, Figure 1 (e)~(h) show that values of deviation are centered at different ranges in I₁and I₂, (0.15, 0.60) and (0.10, 0.30) respectively, due mainly to the processing difference as introduced in 2.2. The mass of deviation data in I₂ with spectrum smoothness can be much smaller than that in I₁with interpolation on a whole.

3.2 Difference Analysis

Interpretation in 3.1 considers slope and deviation distributions of WCM and CNM similar to each other to some extent, further statistical analysis, however, is necessary for a more reliable conclusion. In this part we use Chi-square test to judge whether there is difference between the distributions of the two types of music.

The subranges are defined as shown in 3.1. We take the number of samples in each of them as an indicator of distributional state, and then test difference significances of the eight parameters. The result is shown in Table 2.

Table 2 Results of Chi-square test for difference significances between WCM and CNM probabilistic distributions

Parameters	Degree of freedom	χ²	Significance level
α_L₁	10	32.92	<0.005
α_P₁	9	5.87	>0.1
α_L₂	7	11.03	>0.1
α_P₂	5	4.34	>0.1
ε_L₁	9	5.29	>0.1
ε_P₁	8	8.00	>0.1
ε_L₂	4	18.27	<0.005
ε_P₂	3	11.50	<0.01

Degree of freedom in the table is equal to (the number of music types-1)×(the number of subranges with nonzero samples-1). The result indicates that the distributions of α_L₁, ε_L₂and ε_P₂ have remarkably significant difference (p<0.005 for α_L₁ and ε_L₂, p<0.01 for ε_P₂). For α_L₁ data of WCM is more skewed towards the range of (-2, -1), while that of CNM towards (-1, 0). For the distributions of ε_L₂and ε_P₂, however, it can be observed that the values of CNM are slightly smaller than those of WCM, or the spectra of CNM have better linearity than those of WCM, on a whole. Whilst it may be the most surprising that no significant difference is found in the other five parameters, α_L₂, α_P₁, α_P₂, ε_L₁ and ε_L₂: in spire of utter disparity between regional and cultural backgrounds of the two types of music, they coincide so much with each other in temporal aspect.

The difference in α_L₁ can be explained in two aspects. On one hand, textures in the compositions of WCM are commonly constructed by multiple voices, for example polyphonic forms prevalent during the Renaissance and Baroque period and Homophonic forms during the Classicalism and Romanticism period [34]. Complex luxuriant combination of rhythmic lines leads to intensive fluctuation on the larger time scale (greater than 5s), thus strong low-frequency energy would emerge in I₁ region so that the slope tends to -2. On the contrary, CNM are mostly composed as a single line with little accompaniment, leading to relative weak low-frequency energy and flat spectrum. Piano Concerto No.2 composed by Rachmaninov and scored for two instrumental elements piano and orchestra, has over twenty variations in all of its three movements, which represent in current calculation that α_L₁∈(-2, -1) is obtained in eight of its ten fragments. Whereas for ancient tunes in CNM such as Lofty Mountains and Flowing Water and Melody of Highland with commonly zither or lute solo, the results of most fragments appear as α_L₁∈(-1, 0).

On the other hand, the change in performing dynamics also has influence on the slope in I₁ region, similar to the texture factor. CNM composition follows the principles of simplicity and gentleness, and focuses on depicting melodic lines in details, resulting in less intensive dynamic than that of WCM [35]. Nevertheless, there are still several exceptions such as House of Flying Daggers, a work performed in monophonic form by lute, whose dynamic variation is so rich that the values of α_L₁ of its two fragments reach -1.30 and -1.41, respectively.

The reason for the differences of ε_L₂and ε_P₂distributions, regretably, are still left unclear, for there hasn’t been any theoretical explanation for this problem. The indication of ML mentioned in Section 1 considers deviation as an important parameter, in other words, the better linearity the spectrum has, the closer it is to 1/f behavior. Nevertheless, the relative small ε_L₂and ε_P₂from the fragments of CNM as shown in Figure 1 (g) and (h), in our opinion, cannot lead to the conclusion that rhythmic or melodic structure of CNM is more excellent than that of WCM. Similar to deviation evaluation in ML, Xu [36] defines a more flexible indicator called Reliability

(3)

where ε stands for RMS deviation in the bandwidth between f₁ and f₂(f₂>f₁). According to Equation (3), M^# is determined by both the magtitudes of deviations and their distributions in the frequency region. The requirement M^#>500 in [36] for the formation of 1/f noise is too rigorious for music signals in practice, but we still affirm this indicator to be valuable to some extent.

4. Expansion and Modification of ML

4.1 Two Suggestions

As introduced in Section 1, FSMF in ML for evaluating soundscape tempo are constructed by fitting probabilistic distributions of music fragments. According to the calculation results and discussion in Section 3, the patterns of FSMF could be influenced by the music types we selected for the statistical significant difference we have found. Furthermore, comparing with temporal structures of intensity and pitch, we also observe they are not identical to each other but have individual characteristics of their own. Therefore, we will make two suggestions in the following on expansion and modification of ML.

Firstly, the current ML is extracted from 15 repertoires originated from Europe (including WCM, jazz and pop) [21], standing for western music culture and style only. As an indicator for sonic environment evaluation, however, ML should have regional universality for application, thus it would be necessary to supplement other music samples from different regions to modify FSMF in ML. It’s impossible to cover all types or styles of music systems in the world, but just enough to select representative samples. We recommend CNM as an optimal choice according to the results in current study. CNM with historical and cultural accretion of thousands of years, has been symbolizing the ethnic characteristics and spirit in the orient since time immemorial. And its artistic status has already been recognized internationally, Kunqu and Beijing Opera, for example, are respectively listed as two of the Masterpieces of Intangible Heritage of Humanity by UNESCO in 2001 and 2010, implying the feasibility of mixing CNM fragments into ML. Meanwhile, as found in Section 3, the partly difference between WCM and CNM in temporal aspect, such as that of α_L₁, also means objectively the necessity of sample expansion.

Secondly, the current ML only contains evaluation for intensity variation (L_Aeq), but we believe that it would be better for holistic approach of soundscape temporal description to supplement other variants, such as pitch, roughness, sharpness etc. In this paper we try using log₁₀P vibrational sequence to describe pitch fluctuation, the results of which show similar distributions to those of L_Aeq. Addition of pitch evaluation into ML is considered to be advisable and necessary. On one hand, rhythm and melody, the two basic elements for texture, roughly corresponding to intensity and pitch, are effective on listeners’ consciousness to musical tempo. This perceptual pattern should also be available for environmental sounds, according to Schafer’s methodology of music-soundscape analogy, but has not been fully verified yet. On the other hand, intensity and pitch variation do not synchronize with each other most of the time, and listeners’ psychological structure on temporal perception and evaluation would be multiple and complex. Fundamental data extraction on intensity, pitch, and even other variants, must have benefits for further research.

4.2 The Updated FSMF

In this part distributional patterns as shown in Figure 1 are fitted to reconstructed slope and deviation FSMF of L_Aeq and log₁₀P. Asymmetric Gaussian Function is employed for slope patterns fitting following Coensel’s method [21]:

(4)

where σ_α₁ and σ_α₂ decide inclinations of each sides, and ρ_α is the value of α when the function reaches maximum 1. All of them are undetermined parameters. Small deviation or strict linearity is considered to be high degree of music-likeness, thus the membership for the side to 0 is fixed to 1, the other side, however, is fitted by Gaussian attenuation:

(5)

Equation (4) and (5) are identical to μ_A(α) and μ_E(ε) in Equation (1), respectively, except parameters omission in the latter. Figure 2 shows FSMF in current study accompanied by those of Coensel’s work (L_Aeqonly) for comparison.

Figure 2 The FSML patterns of ML constructed by current work and Coensel’s work: (a) slope of I₁region (for Coensel I_1C∈[0.002Hz, 0.2Hz]), (b) slope of I₂region, (c) deviation of I₁region (for Coensel I_1C∈[0.002Hz, 0.2Hz]) and (d) deviation of I₂region

The low-frequency limit of spectrum fitting for current work is 0.017Hz, corresponding to time scale of 60s (1min), while for Coensel 0.002Hz, time scale of 15min. This change of time-frequency limit determination is based on two reasons.

One is duration limits of the music fragments. Each movement in WCM lasts roughly 2-5min, and the signal formed by jointing all the movements in one cycle or suite together would be long enough but that the movements are discontinuous and independent of each other to some extent. Instrumental and opera compositions in CNM are usually several minutes long with maximum of no more than 10min. Therefore, it is more suitable to take each single movement or tune as an sample with shorter time scale limit, in our opinion.

The other is the need for subjective evaluation in the future. Psychoacoustical detection in laboratory is being planed for environmental soundscape signals evaluation to explore whether there exists any relationship between ML and sonic temporal perception. Different from field survey, the duration of each signal used in the experiment cannot be too large and is set preliminarily 1min as shown in current study.

It can be seen in Figure 2 (a) and (b) that when FSMF reach maximum the values of slope ρ_α are not so close to -1 as predicted previously but have some biases, which correlates with sample size and repertoire selection. The shape of FSMF of L_Aeq in I₁ as shown in Figure 2 (a) are relatively symmetric, while that of Coensel in I_1C attenuates slowly in region of (-2, -1), and steeply in (-1, 0). This distinction is due to CNM samples mixture as discussed in Section 3. Probabilistic distributions of log₁₀P in both I₁ and I₂ are centered in region of (-1, 0) as shown in Figure 1 (b) and (d), so FSMF in Figure 2 have similar characteristics. In Figure 2 (c) and (d) it can be found that deviations obtained by Coensel are larger than those by current study on a whole, while deviation FSMF of L_Aeq and log₁₀P are not largely different from each other.

5. Conclusion and Expectation

Researches on sonic environment design, creation and management have risen recently, shifting from traditional environmental noise control on the basis of physical acoustics towards soundscape approach, and critera other than sound level have been developed successively. The temporal evaluation is one of the multiple aspects with a large sum of problems still left unsolved.

Based on the proposal of ML, current study focuses on the problem of sample selection, and considers it necessary to supplement music fragments of other cultures and expand evaluation on the variants other than intensity, according to spectrum fitting data analysis. Nevertheless, this is still resorting to methodological discussion rather than verification on effectiveness and reliability of the indicator.

Furthermore, how important the role temporal factor is playing on soundscape evaluation, in which way it generates listeners’ general impression on auditory consciousness and emotion, whether ML is the best indicator etc, are all very interesting issues on which few literatures have expounded their views specifically. We are expecting satisfatory answers to them, though, in future works.

Reference

[1] Johnson J B. The Schottky Effect in Low Frequency Circuits. Phys. Rev. 1925, 26: 71-85

[2] Milotti E. 1/f noise: a pedagogical review [Z]. 2002

[3] Schottky W. Small-shot effect and flicker effect [J]. Phys. Rev. 1926, 28: 74-103

[4] Bak P, Tang C, Wiesenfeld K. Self-organized criticality: an explanation of 1/f noise [J]. Phys. Rev. Lett. 1987, 59: 381-384

[5] K. Falconer. Fractal geometry, mathematical foundations and applications [M]. John Wiley and Sons, Inc., Chichester, 1999

[6] Voss R F, Clarke J.1/f noise in music: music from 1/f noise [J]. J. Acoust. Soc. Am. 1978, 63: 258-263

[7] Voss R F, Clarke J. 1/f noise in music and speech [J]. Nature. 1975, 258: 317-318

[8] Jeong J, Joung M K, Kim S Y. Quantification of emotion by nonlinear analysis of the chaotic dynamics of electroencephalograms during perception of 1/f music [J]. Biol. Cybern. 1998, 78: 217-225

[9] Ro W, Kwon Y H. 1/f noise analysis of songs in various genre of music [J]. Chaos, Solitions and Fractals. 2009, 42 (4): 2305-2311

[10] Jennings H D, Ivanov P C, Martins A D M, et, al. Variance fluctuations in nonstationary time series: a comparative study of music genres [J]. Physica A: Statistical Mechanics and its Application. 2004, 336 (3-4): 585-594

[11] Levitin D J, Chordia P, Menon V. Musical rhythm spectra from Bach to Joplin obey a 1/f power law[J]. Proceedings of the National Academy of Sciences, 2012

[12] Coensel B de, Botteldooren D and Muer T D. 1/f Noise in rural and urban soundscapes [J]. Acta Acustica United With Acustica. 2003, 89: 287-295

[13] Boersma H F. Characterization of the natural ambient sound environment: Measurements in open agricultural grassland. J. Acoust. Soc. Am. 1997, 101: 2104–2110

[14] Morgan S, Raspet R. Investigation of the mechanisms of low-frequency wind noise generation outdoors. J. Acoust. Soc. Am. 1992, 92: 1180–1183

[15] Ribeiro H V, Souza R T de, Lenzi E K et, al. The soundscape dynamics of human agglomeration. New Journal of Physics. 2011, 13 (2): 1-8

[16] Zhang X, Hu G. 1/f noise in a two lane highway traffic model. Phys. Rev. E. 1995, 52: 4664–4668

[17] Chowdhury D, Schadschneider A. Self-organization of traffic jams in cities: Effects of stochastic dynamics and signal periods. Physical Review E. 1999, 59 Part A: 1311–1314

[18] Ostashev V E. Acoustics in moving inhomogeneous media. E&FN Spon, London, 1997. Chapter 6.

[19] Coensel B de et, al. The influence of traffic flow dynamics on urban soundscapes [J]. Applied Acoustics. 2005, 66: 175~194

[20] Licitra G, Mamoli G, Botteldooren D et, al. Traffic noise and perceived soundscapes: a case study [A]. Forum Acusticum’05 [C], Budapest, 2005

[21] Coensel B de. Introducing the temporal aspect in environmental soundscape research [D]. Belgium: Ghent University, 2007

[22] Schafer R M. Our sonic environment and the soundscape: the tuning of the World [M]. USA: Destiny Books, Rochester, Vermont, 1994

[23] Raimbault M, Lavandier C, Bérengier M. Ambient sound assessment of urban environments: field studies in two French cities [J]. Applied Acoustics. 2003, 64: 1241-1256

[24] Axelsson Ö, Berglund B and Nilsson M E. Soundscape assessment [J]. J. Acoust. Soc. Am. 2005, 117: 2591-2592

[25] Davies W J, Adams M D, Bruce N S et, al. Perception of soundscapes: an interdisciplinary approach [J]. Applied Soundscapes. 2013, 74 (2): 224-231

[26] Kawai K, Kojima T, Hirate K, Yasuoka M. Personal evaluation structure of environmental sounds: experiments of subjective evaluation using subjects’ own terms [J]. J. Sound Vib. 2004, 227: 523-533

[27] Zeitler A, Hellbrück J. Semantic attributes of environmental sounds and their correlations with psychoacoustic magnitudes [J]. Proceedings of The 17th International Congress on Acoustics (ICA) [C], Rome, Italy, 2001

[28] Coensel B de and Botteldooren D. The quiet rural soundscape and how to characterize it [J]. Acta Acustica United With Acustica. 2006, 92: 887~897

[29] Botteldooren D and Coensel B de. Quality labels for the quiet rural soundscape [A]. Inter-Noise 2006 [C], Honolulu, Hawaii, USA, 2006

[30] Slaney, M. Auditory toolbox: A MATLAB toolbox for auditory modeling work (Apple Tech. Rep. No. 45). Cupertino, CA: Apple Computer, 1995

[31] Hsü K J and Hsü A. Fractal geometry of music [J]. Proc. Natl. Acad. Sci. USA. 1990, 87: 938-941

[32] Hsü K J and Hsü A. Self-similarity of the “1/f noise” called music [J]. Proc. Natl. Acad. Sci. USA. 1991, 88: 3507-3509

[33] Xiang K. Fractal geometry in ancient zither tunes (In Chinese). Article Education. 2006, 4: 122-123

[34] Benward B, Saker M. Music: In theory and practice, Volume 1. New York: McGraw-Hill, 8th edition, 2009

[35] Tian Q. The linear principle of Chinese Music (In Chinese). China Musicology. 1986, 4: 58-67

[36] Xu S L. What is the 1/f noise? (In Chinese). Technical Acoustics. 2008, 27 (4): 616-619